Executive Summary
In November of 2016, a joint 18-month study from the State Street Center for Applied Research and the CFA Institute, which combined 200 in-depth interviews with global industry leaders and a survey of 3,300 "investment professionals", was released, and claimed to have discovered "the hidden variable of [organizational] performance". Dubbed "Phi", the research suggested that a combination of the motivational forces of Purpose, Habits, and Incentives was associated with superior organizational outcomes; specifically, the results showed that a one-point increase in Phi was associated with 28% greater odds of excellent organizational performance, 55% greater odds of excellent client satisfaction, and 57% greater odds of excellent employee engagement.
In this guest post, Derek Tharp – our new Research Associate at Kitces.com, and a Ph.D. candidate in the financial planning program at Kansas State University – analyzes the Phi study and delves deeper into exactly what it is, how it is measured, and what conclusions should (and shouldn't) be drawn from this study.
Because while the researchers did find a statistically significant relationship between Phi (as measured by self-reported purpose, habits, and incentives), and organizational outcomes, client satisfaction, and employee engagement, it's not entirely clear what the results actually prove.
The challenge, in part, is that the study itself didn't actually evaluate organizational performance, client satisfaction, or employee engagement! Instead, the researchers evaluated self-reported assessments of organizational performance, client satisfaction, and employee engagement, and compared them to self-reported statistics on Phi (for the entire organization). And this distinction is meaningful, because it isn't exactly clear that employees make valid or reliable assessments of company-wide motivational forces, nor the entire company's organizational performance, client satisfaction, or employee engagement — and especially when these employees may not actually even know if clients' goals are being met (how "organizational performance" was defined), whether clients or satisfied, or whether other employees within a large multinational financial services firm are actually even engaged!
In addition, there's the fundamental challenge that a correlation between two data points doesn't necessarily prove causation either. In other words, the fact that high-performing organizations also report higher Phi doesn't mean the higher Phi caused the better results. Perhaps organizations that are already outperforming simply attract more motivated employees (which means the causality goes the other direction).
In the end, the Phi study did ask some interesting questions and set out with some lofty ambitions to evaluate the impact of motivational forces - a question that is admittedly not easy to study. But until further research can tie Phi to actual outcomes or provide evidence that the self-reported measures are assessing something meaningful in a valid and reliable way, then the conclusions we can actually draw from the Phi research are limited.
What Is Phi?
What is Phi? The name was derived from motivational forces – Purpose, Habits, and Incentives – examined in an 18-month study conducted by the State Street Center for Applied Research and the CFA Institute. The researchers combined findings from 200 in-depth interviews with global industry leaders and a survey of 3,300 “investment professionals” (which was broadly comprised of various executives and employees of asset managers, asset owners, financial advisors, central bankers, regulators, policy makers, and others) from 20 different countries. (The study also surveyed individual investors, but this discussion focuses solely on the responses of investment professionals.)
The study sought to examine the question: “How can we leverage motivation to achieve better financial outcomes?” For the purposes analyzing investment professionals in this study, “outcomes” included measures of organizational performance, client satisfaction, and employee engagement for each investment professional’s own organization.
From their analysis, the authors conclude:
“We discovered a previously hidden variable [Phi] with a statistically significant relationship to long-term organizational performance, client satisfaction, and employee engagement.”
Specifically, the researchers found that a one-point increase in motivational forces, as measured by “Phi”, was associated with 28% greater odds of excellent organizational performance, 55% greater odds of excellent client satisfaction, and 57% greater odds of employee engagement. The implication here being that financial (and perhaps other) organizations that better focus on supporting and improving their employees’ purpose, habits, and incentives within the firm, may achieve better business results. Which, as a result, could have substantial implications regarding best practices in employee management.
To better understand what the Phi researchers actually analyzed, though, we have to look deeper at their methodology.
What Was Phi Actually Designed To Measure?
The Phi study examined whether an investment professional's perceptions of 10-year organizational performance (achieving their clients’ goals and investment goals over the past 10 years), client satisfaction, and employee engagement – each measured on a scale of 1 to 5 – were correlated with the employee's self-reported Phi.
Phi itself was calculated based on responses to questions intended to measure an investment professional's sense of purpose, habits, and incentives (responses which increased Phi are indicated in bold below):
1. Purpose: What motivates you to perform generally and in your current role? (Select top three)
a. The hope of receiving a big bonus/salary increase.
b. The fact that everyone can see my performance and I do not want to look bad.
c. I know it is important to fulfill the end client’s goals.
d. The feeling of doing something in the service of something larger than myself (e.g. creating a better life situation for the end client, supporting the values of my organization to achieve long-term organizational growth).
e. I just love what I do and would continue doing it even if I was not paid.
2. Habits: What is the reason that you are still working in the investment management industry? (Select up to two)
a. I am reasonably satisfied with my job.
b. It is where the money is, i.e. where I can earn the most.
c. I like the status that a job in this industry brings.
d. I am passionate about the markets.
e. I am inspired by a family member/industry figure.
f. I can help people and organizations achieve their financial goals.
g. I like working with very smart people.
h. I help facilitate economic growth and development.
i. It would be too difficult to change jobs and pursue a new career in another industry.
j. I am thinking about quitting.
3. Incentives: Which description most closely matches the way you think about your work?
a. As a job (I work only for the sake of the money. I am really happy when the weekend comes and I satisfy my intellectual curiosity and interests via hobbies and not work.)
b. As a career (My work energizes me, and my aim is to advance and get promoted. I sometimes bring work with me home since I want to deliver excellent results. Sometimes I do however wonder about the meaning and importance of what I do.)
c. As a calling (I am devoted to my work. When working, I feel that I am part of something larger than myself. The value my efforts bring is clear to me, and I never question the meaning of what I do. I would continue to work even if I was independently wealthy.)
To summarize, the researchers were examining whether self-assessed measures of motivational forces (Phi) of investment professionals were correlated with self-assessed perceptions of organizational performance. And, ultimately, their analysis did find a statistically significant correlation between Phi and the self-assessed outcomes of organizational performance, client satisfaction, and employee engagement.
Should We Trust The Findings On Phi?
Undoubtedly, the conclusions reported in the Phi study are intriguing. The ability to utilize a single measure to predict enhancements in organizational performance, client satisfaction, and employee engagement would be of interest to many. It could help predict which companies will outperform, and provide guidance to managers about what they need to do to improve organizational results.
Unfortunately, however, it appears that some of these findings have been misreported, and not just in publications like the New York Times, Financial Advisor, and Wealth Management, but within the actual CAR/CFA Institute study of Phi itself.
The problem is that the Phi study doesn’t actually tell us anything about organizational performance, client satisfaction, or employee engagement. Rather, the study measures self-assessed perceptions of organizational performance, client satisfaction, and employee engagement. The distinction is important, but to understand why it is potentially problematic, it’s helpful to explore a few concepts that are essential to conducting research.
Validity And Reliability
One challenge in all empirical research is taking abstract ideas and converting them into something concrete that we can observe and measure. Researchers call this process “operationalization”. A construct is considered to be “valid” when it actually measures the concept we intend to measure, and a measure is “reliable” if it will consistently provide the same value when measured multiple times.
Think of a bathroom scale. If each time you step on a scale it gives you both your correct weight and a consistent measurement, then we would say the scale is both valid and reliable. However, if you stepped on the scale and it gave you your weight with a margin of error of ± 50 pounds, we would say the scale is valid but not reliable (i.e., it is measuring your weight, it’s just not providing a consistent measurement). Alternatively, if you stepped on the scale and it told you the temperature of your feet, we would say the scale is not valid (regardless of whether it is reliable), because it’s not capturing the abstract idea (weight) that we want it to.
A common way to visualize this is with a dartboard:
As you can infer from above, it’s possible for a measure to be valid and reliable, valid but not reliable, reliable but not valid, or neither valid nor reliable.
Validity And Reliability Of Phi
Within the context of the Phi study, understanding validity and reliability gives us reason to be skeptical of the researchers’ findings.
They claim that Phi predicts excellent client satisfaction, but they never directly measure client satisfaction. They claim that Phi predicts excellent organizational performance, but they never actually measure organizational performance. And they claim that Phi predicts employee engagement, but they never measure employee engagement. Everything the researchers measured was based on self-assessments, but it’s not clear we should trust self-assessed measures in this context. The fact that an employee believes that their organization is achieving their client’s goals, believes that their clients are satisfied, or believes that employees are engaged, is not actually evidence that they are.
It’s worth acknowledging that often times researchers don’t have access to the precise information we want to evaluate, and we do have to settle for something less. Proxies are commonly used in place of the variables that we’re really interested in. But, at a minimum, proxies should pass a test of “face validity” (i.e., the degree to which it appears to be valid on its face) or have been found to be closely correlated with the variable of interest. In this study, it’s not clear that this is the case.
For instance, should we trust employee evaluations of client satisfaction? At the extremes, employees probably have some sense of client satisfaction (e.g., the employees in a call center of your local internet service provider probably have a different perception of customer satisfaction than the employees of a beloved local restaurant). But what about more granular differences? Should we trust that a diverse group of employees from mutual funds, banks, other financial institutions have a good (and reliable) sense of client satisfaction – especially when not all of those employees are even client-facing in the first place?
Similarly, should we trust employee evaluations of organizational performance? Employees were asked to specifically focus on “their organizations’ performance in terms of achieving their clients’ goals and investment goals over the past 10 years”, but again, how well does an average employee know whether their organization is achieving their clients’ goals? Particularly when not all of those employees may even be client-facing or would even know exactly what their clients’ goals are in the first place, much less whether those goals are being achieved!
And, additionally, should we trust employee evaluations of employee engagement? For smaller firms where everyone knows everyone else, this may be possible, but what about within a large, multinational financial services firm? Can any particular employee really have a good understanding of employee engagement across the whole firm? That would seem to be asking a lot. Instead, it seems more likely that evaluations might be localized. For example, employees in a call center likely feel different than employees in management who likely feel different than employees in sales. And this is assuming employees are providing valid assessments in the first place, rather than substituting in a different question, such as, “What is my satisfaction as an employee?”
It is worth noting that just because a measure is self-assessed does not mean that it’s invalid or unreliable. Measures of intangible concepts such as subjective well-being and self-reported financial satisfaction have been found to be both valid and reliable. Self-Determination Theory and BIS/BAS (Behavioral Inhibition System / Behavioral Approach System) are both identified as providing theoretical guidance for this study. While I can't claim to be an expert in this field, my own review of the literature in these areas did suggest valid and reliable measures are available for measuring purpose, habits, and incentives, but those scales tend to be far more thorough than what was used in this survey. Further, I could not find evidence supporting the use of the self-assessments of organizational performance, client satisfaction, or employee engagement utilized in this study.
In social science, good measures are subjected to repeated testing to assess reliability and validity. Of course, every measure must be used for the first time at some point, so we can't necessarily fault the Phi researchers for trying something new, but no tests or justifications of reliability or validity were reported. As a result, it is unclear what, if any, prior usage and evaluation these measures have been subjected to. There is no doubt that creating surveys is tough and the point here isn't to belittle the challenge the Phi researchers had in front of them, but ultimately, questions about validity and reliability are crucially important.
To put in perspective just how hard avoiding problems of validity and reliability can be, even when researchers publishing in top-tier academic journals are going to great lengths to avoid validity problems, they will still sometimes fall short. A paper on consumer expectations and the measurement of perceived service quality from Teas (1993) provides a good example of just how thorough and careful researchers need to be when evaluating measures related to consumer (client) satisfaction. While marketing studies have found support for the idea that consumer satisfaction is at least partially determined by “the degree and direction of discrepancy between consumers’ perceptions and expectations”, Teas (1993) notes that there are at least six different conceptions of “expectations” in marketing research (forecasted performance, deserved performance, equitable performance, minimum tolerable performance, ideal performance, and service attribute importance), each of which may result in serious measurement validity problems.
Ultimately, if even careful, sophisticated peer-reviewed studies evaluating consumer perceptions of consumer (client) satisfaction can run into serious validity issues, then there are reasons to be concerned with the validity of more simplistic measures of employee perceptions of consumer (client) satisfaction.
What Could Explain Phi?
Notwithstanding these concerns, the study did find a statistically significant relationship between Phi and the self-assessed measures of organizational performance, client satisfaction, and employee engagement. Which raises the question: what might actually explain the relationship that was observed?
Omitted-Variable Bias
It’s unclear what, if any, demographic controls—such as age, income, race, or gender—were included in this study. In high-quality research, theory should guide what demographic variables may be important to the research question at hand. The theory presented in this study is underdeveloped (or under-reported) from the perspective of evaluating how we should expect demographic factors to influence these particular research questions.
However, hypothetically, suppose younger respondents displayed higher levels of Phi relative to older respondents, simply because they’re more enthusiastic about their career at this early stage. Further, suppose that younger respondents were more likely to score their organization favorably in terms of client satisfaction, due to that same optimism.
This would mean organizations which skew towards younger employees would be associated with higher levels of Phi and higher client satisfaction, but the result wouldn't necessarily be because Phi predicted client satisfaction. Rather, it may simply be because being young happened to predict both. Under such circumstances, it’s possible that controlling for age would eliminate the statistical significance found between Phi and client satisfaction (i.e., age was an omitted-variable).
In all likelihood, what's going on in these findings is probably not that straightforward, but they key point is that responses to Phi questions could simply correlate strongly with other demographic variables (or some other factors), making Phi merely a proxy for something else left out of the model. As a result, omitted-variable bias could be overstating or understating the association between Phi and the outcome variables.
Social Desirability Bias
The tendency to answer questions in a manner that we believe would be viewed favorably by others is a well-known problem within survey research. This is known as the social desirability bias, and it is particularly problematic for questionnaires using self-assessments.
In looking at the Phi questions specifically, it appears that both the questions underlying the predictor variable (Phi) and the outcome variables (self-assessed organizational performance, client satisfaction, and employee engagement) may be susceptible to social desirability bias. For instance, consider the question related to “purpose”:
1. Purpose: What motivates you to perform generally and in your current role? (Select top three)
a. The hope of receiving a big bonus/salary increase.
b. The fact that everyone can see my performance and I do not want to look bad.
c. I know it is important to fulfill the end client’s goals.
d. The feeling of doing something in the service of something larger than myself (e.g. creating a better life situation for the end client, supporting the values of my organization to achieve long-term organizational growth).
e. I just love what I do and would continue doing it even if I was not paid.
There’s a strong case that some of these answers are more socially desirable than others. If you are unsure, imagine you’re at a cocktail party and one person introduces themselves saying (a) while another introduces themselves saying (d). Regardless of how you personally feel about those responses, do you think one might be perceived more favorably by a crowd of people? If so, it’s reasonable to suspect social desirability bias could be a factor here, and it seems possible similar dynamics exist within the questions related to habits and incentives.
In addition, the outcome variables in this study (self-assessed organizational performance, client satisfaction, and employee engagement) could be subject to social desirability bias as well. Being critical of your own organization’s performance, client satisfaction, or employee engagement could be perceived as anti-social responses. Of course, social desirability bias could bias responses in the other direction, too. For instance, employees could express solidarity with one another through criticism of an organization (similarly, some social environments may reward monetary motives while punishing altruistic motives). But the key point is that some factor other than what we’re trying to examine (motivational forces) may be influencing responses. And because not all individuals are equally susceptible to social desirability bias, this bias could partially explain the study’s findings.
Correlation Vs Causation
Anyone who has taken a statistics class can probably recall the old adage that, “correlation does not imply causation”. That reminder comes in handy here. Though much of the framing within the Phi report (and coverage in the financial press) suggests a causal connection between the two, this study found, at best, a correlational relationship.
Supposing all of the measures in this study are both valid and reliable, we still can’t say anything about the direction of the relationship between Phi and self-assessed organizational performance, client satisfaction, and employee engagement. There are many different types of causal links, and it remains possible that the causal link, if any, is in the opposite direction (e.g., outcomes predict Phi).
For instance, perhaps organizational success and having happier clients results in employees feeling more engaged and having a greater purpose. Which could mean the better organizational and client outcomes are causing people to report higher Phi, not that Phi is leading to better organizational and client outcomes. Or, alternatively, perhaps already-successful organizations simply attract more motivated employees, who then self-report their higher Phi. But again, that means (hypothetically) it may not be that higher Phi causes better business outcomes, but that better business outcomes cause higher Phi!
Going Forward With Phi
Looking beyond the specific research questions under consideration, there are still a lot of interesting questions and considerations raised by the Phi study.
For instance, Suzanne Duncan, the global head of the Center for Applied Research at State Street, says: “Building a culture and environment with aligned purpose, habits and incentives can give organizations a competitive advantage that is sustainable and will benefit clients, the providers themselves and ultimately society as a whole.” And: “When investment professionals are asked to deliver against inappropriate metrics on an inappropriate time horizon, their passion for markets eventually becomes divorced from their true purpose – achieving the long-term goals of the investors they serve.”
Given her description, it seems plausible that firms with high Phi might have better performance. But the whole point of research is to provide empirical evidence to evaluate that claim. And, unfortunately, it doesn’t seem the researchers entirely did so here, because, in the end, they just observed a self-assessed correlation that could have been driven by many other factors.
Now, to state my own bias, Duncan's statement resonates with me. Of course, my sentiments may be misguided and I should be careful to not to merely seek rhetoric that supports ideas that speak to me, but when looking at things like the increased interest in topics like financial life planning, sustainable and socially responsible investing, contemplating the social impact of one’s career, and even more academic pursuits like the study of “Humanomics” and Oxford University Press’ recently published Economics and the Virtues – there seems to be growing and genuine interest in broad questions related to developing a more “humanistic” conception of our financial industry, as well as the way we engage with markets more generally.
Going forward, it would be interesting to see how Phi relates to actual outcomes and to develop a better understanding what these self-assessed measures may actually be telling us. If Phi can be shown to impact actual outcomes or these particular self-reported measures are found to be meaningful and measurable, then Phi could have significant implications for hiring employees and building a firm’s team and culture. But even if the study falls short of giving us much practical information at this time, the Phi study can be commended for raising some interesting questions.
So what do you think? Does the self-assessed nature of the Phi study’s outcome variables diminish the insights we can gain from this study? Could the study be missing some important variables? Would we benefit from a more “humanistic” understanding of the financial services industry? Please share your thoughts in the comments below!
curtis fort says
This is an amazing read, really going to start following this site. Amazing information. Thanks, Curtis at http://www.murkingroup.com
Brooke says
Couldn’t PHI be a fancy, academic way of saying culture, which has been defined as what employees do when nobody’s looking? Does this establish a truly new lens?