Machine Learning & Equity Investing

This year, we have had more frequent discussions with investors who want to better understand machine learning and why we believe using it gives Euclidean an advantage when investing. This increasing interest reflects a growing awareness of how machine learning is impacting other aspects of people’s lives. From their experiences with book recommendations and voice recognition, to reading articles about self-driving cars, machine learning feels increasingly pervasive. Yet, few people understand what machine learning is and how it differs from traditional methods of computing and analysis.

What is Machine Learning?

As we have discussed in prior letters, machine learning is core to Euclidean’s investment process. We use machine learning to examine the operating histories and investment outcomes of thousands of companies over multiple market cycles. Our goal is to find persistent patterns in the relationships between companies’ operating results, market values, and investment outcomes. Using these patterns, we invest in companies with characteristics resembling comparable opportunities from the past that performed well as investments.

But, we are often asked, what is machine learning, really? And, how does it differ from traditional methods of computing and statistical analysis?

Traditionally, computers have worked by running programs written by software engineers. A program is like a recipe – it includes a list of commands that the computer can execute to achieve a goal. The word processor (Microsoft Word) on which we are writing this letter is a good example of a traditional program. When we click the control and B keys at the same time, it makes a word bold. When we select the print option from the file menu, it sends text to the printer. This logic was written by a software engineer, embedded in the program’s code.

Many tasks we want computers to perform, however, aren’t easily decomposed by a software engineer into a program. This is particularly true when tasks are very complex. For example, the amount of complexity involved in deciphering human handwriting, steering a self-driving car, or evaluating individual companies as potential long-term investments is such that it would be difficult, time-consuming, and likely ineffective for programmers to attempt to capture the “if-then” logic required to automate those kinds of complex systems.

This is why computer scientists have been motivated to develop machine learning capabilities: there are many tasks that we want computers to perform that cannot be tackled with traditional methods of codifying logic and insight into programs. Machine learning provides an alternative to having an engineer program a computer by allowing the computer to program itself. Put another way, machine learning is an attempt to get machines (i.e., computers) to learn how to do things through experience, in a very similar way to how people learn. Here’s an example of how this works.

Machine Learning – Example of Credit Scoring

Imagine you opened a bank and were beginning to offer loans to the people in your city. How would you determine who was a good credit risk? Perhaps you would look people in the eye to decide if they were upstanding citizens who were likely to pay you back. You might also seek to understand the nature of their income and financial obligations. Over time, with experience, you would begin to see patterns in the relationship between the information you captured about people who asked you for credit and the frequency with which they repaid you. With more experience, your success in evaluating future credit risks would likely improve.

Well, imagine that your bank is doing a lot more business and opening new branches. No longer can you personally evaluate each person coming to your bank for credit. So, you decide to build a computer system to standardize how credit evaluations are done across your growing enterprise. One way of doing this would be to document what you have learned about credit scoring and have a programmer embed those insights into code.

There are potential shortcomings to this approach. First, you may open branches in markets where the demographics and economic environments are different from your home base. So, what you have learned in your initial branches about evaluating credit risk may not provide the right guide for these new locations. Second, economic cycles are long. Even if you have been evaluating credit risks for several years, perhaps there are situations you haven’t yet experienced. Third, like all of us, you may have deeply-rooted biases that cause you to overstate or underestimate the risk of certain creditors.

One way to overcome these issues is to take a statistical approach to this credit scoring problem. Imagine you have access to a collection of many years of data on millions of individuals, with detail on their original credit profiles and subsequent repayment histories. The opportunity is to venture beyond your personal direct experiences and learn principles from a much larger body of experience. Machine learning is simply a tool that can help penetrate the “haze of randomness to the structure underneath” [2] this vast amount of data and codify the aspects of an individual that convey the most information about whether he is likely to repay his loan.

Using machine learning, it is now possible to compare an individual’s credit profile to the experiences of millions of others who repaid (or defaulted on) their loans. If your profile resembles people that repay, then welcome to the bank. If your profile resembles those that often default, well then, good luck getting your loan. Evaluating people in this way may seem impersonal, but it is effective. One study showed a large commercial bank estimating that by making credit assessments using machine learning, they could reduce credit losses by 6-25%. [3]

Machine Learning – How Does It Relate to Classical Statistics and Linear Regression?

The above example may seem similar to the notion of finding a statistical correlation between variables. That is, finding how a person’s characteristics are correlated with the likelihood that they will default on a loan. As a result, we are often asked how machine learning differs from statistics.

In fact, machine learning and statistics are deeply linked, with both fields borrowing heavily from each other. Despite the link, the problems that machine learning practitioners have generally been attempting to solve have complexities that make them not well suited to classical statistical techniques such as linear regression. This gap has motivated the development of new algorithms within the field of machine learning that are optimized to finding meaningful, often non-linear, relationships within vast collections of data. Machine learning’s success at finding important and complex relationships can be seen in functional applications for speech recognition, language translation, facial recognition, and even the ability for cars to drive themselves. [4]

To appreciate why classical statistical techniques and linear models are often insufficient for representing the complexity in these types of applications and how they might be of limited use in evaluating potential equity investments, consider a simple example.

The Exclusive OR Problem (aka XOR)

This thought experiment shows the limitations of linear modes of analysis. It represents a classic problem that would have been aspiringly theoretical in my case, but perhaps less so in John’s.

Think back to college. Imagine that you have two girlfriends: one named Emma and the other named Kim. Both Kim and Emma like you a lot but neither one knows about the other. Imagine further that there is a party coming up on Friday night. Consider what will determine how your Friday night goes.

While this may seem like a trivial problem, this illustration is actually an example of a function that is essential to all of computation. This function is called XOR (for eXclusive OR) [5] and it cannot be represented by a linear model or machine. After all, with this example, there is no line you can draw through the two axes representing Emma and Kim’s attendance at the party to successfully separate the space between happy and unhappy outcomes. Linear modes of representation are severely constrained in their ability to model many simple systems and decisions. This helps explain our perspective that more robust methods are required for seeking and representing investment principles that withstand the test of time.

Linear vs. Non-Linear Analysis

At Euclidean, we seek timeless principles by examining 40+ years of quantifiable experience in equity investing. Our goal using machine learning is to endow a systematic process with the intuition – or pattern-recognition skills – of an expert investor.

Consider that, lurking in this vast sea of data around public companies and their historical investment outcomes, there may be crucial, non-linear relationships involved in successfully using a company’s operating results to evaluate it as a long-term investment. What might these relationships look like?

Well, imagine that you might find in the past – all else being equal – that there are important relationships between companies’ balance sheet leverage, prices (in relation to earnings), and investment outcomes. Maybe you find that over long periods, companies with some debt tend to do better than those with none. This could be a sign that when a company has some leverage it reflects both that it is bankable and also that it has enhanced flexibility in pursuing opportunities. Conversely, maybe you also find that there is a level of indebtedness where companies often have bad enough outcomes that you might not want to own them at almost any price. How might these relationships appear in linear and non-linear models?

If these hypothetical relationships persisted in the historical record, there is so much about them that linear tools would never see. We do not think great investors are constrained in this way. We believe they evaluate current opportunities in the context of their prior experiences and that those evaluations occur in a non-linear manner.

Machine Learning – At Euclidean

By using machine learning, we aspire to emulate the way an investor builds expertise, but also to inform the learning process with far more examples than any one investor could experience on his own. We differ from most investment managers in our desire to simulate performance over long periods of time. We are just as interested in how well our approach would have performed in 1975 as we are with how it would have in 2005, and we are constantly pushing to go as far back in time, so to speak, as possible.

This is a crucial point. If you find that something worked very well for a short period of time (like how a growth strategy might have worked in the 1990’s), there is a reasonable chance that you found a temporary trend. If you based your investment strategy on an approach that was validated over just a few years, you would (or should!) find yourself in the frightful situation of wondering whether you may wake up tomorrow to find that the trend has stopped.

For us, it has always seemed imperative that we should seek not the best performing strategy for any given time but the most timeless investment methods possible. The investment principles we have uncovered have persisted across decades, are grounded in common sense, and we believe would have enabled an investor to achieve good results over most long-term periods during the past 40 years.

Of course, in any given year or period, the approach that works best depends on the circumstances of the day. During multi-year periods when valuation multiples expand, we may struggle to keep pace with the market. When price-to-earnings multiples compress, however, we believe that value strategies have a significant advantage.

This is why during times that our performance lags the market, we are not tempted to alter our approach. We believe we are operating with the weight of history on our side, and we expect that our investors will benefit as we adhere to our process during the decades ahead. We note that we have done reasonably well since we launched our fund, during a period characterized by generally increasing valuation multiples. As we look ahead, we feel it is likely that multiples will someday compress from today’s heightened levels. When they do, we believe our style of investing should perform even better.

[1] All returns are purely historical and are no indication of future performance.

[2] Stanford University, Modern Science and the Bayesian-Frequentist Controversy

[3] MIT Sloan School of Management and Laboratory for Financial Engineering, Consumer credit-risk models via machine-learning algorithms

[4] The New Yorker, Auto Correct: Has the self-driving car at last arrived?

[5] Computer circuits are made up of 4 basic logic gates that, by combining, allow for universal computation. These operations are the boolean operators: AND, NOT, OR, and XOR (for eXclusive OR). So, if X and Y are boolean variables, then the statement "X AND Y" equals 1 if both X and Y are 1. Otherwise, it is zero. XOR is a little unique in that the statement "X XOR Y" equals 1 only if X or Y but not both are 1. Otherwise it is zero.

Describing XOR in terms of the two girlfriend problem is a little more accessible. As a reminder, you are happy if either Kim or Emma comes to the party, but not both. If both or none come you are unhappy. f a computer or any abstract processing device is missing one of these types of logic gates then the types of computations it can perform would be limited. You cannot "make" an XOR gate, for example, by combining AND, NOT, and OR gates. These gates are like the primary colors, prime numbers, or fundamental particles of computing. Well, while AND, NOT, and OR can be implemented by a linear function, XOR cannot. XOR is often used as an example because it makes the point that simple linear equations cannot perform even some of the most fundamental logical operations.