From the course: The Data Science of Economics, Banking, and Finance

Data science and money

- [Instructor] In courtroom cross-examinations and in wedding proposals, the general rule is never ask a question unless you already know the answer. And wouldn't it be great if you could already know the answers to your financial questions as well? Like whether a particular investment will be profitable, whether a loan application will get accepted, or whether your credit card is safe online. And while it's never possible to be 100% certain about the future, data science can provide useful answers to these questions and a host of others in economics, banking, and finance. In this course, we're going to take a non-technical look at how data science is applied to a wide range of financial topics from fraud detection and automated loan applications to trend analysis and economic policy. We'll even talk about financial social media and social banking, but we'll begin by talking about how data science applies to money in general. Basically, there are a few things to cover here. Number one is data science. This is simply the combination of mathematics and statistics with computer programming in applied settings, too familiar definition by now. But data science and money is the application of those techniques from data science to solve practical problems in finance, banking, and economics. And then finally, I also want to talk about AI and LLMs. We're going to have a special focus on these recent developments, both in artificial intelligence and in the large language models as they apply to financial work. An important thing to remember however is that simple is often good enough even in large financial work. Many organizations operate in a functionally data free environment. That means they may or may not have data, but if they have data, maybe they're not using it regularly for operational insights. So getting anything rolling at all can be an advantage. I mean, really, even a little bit of data can lead to actionable insights or knowing what to do as a result of your analysis. And in that respect, for many organizations and many purposes, spreadsheets are a great tool. On the other hand, data science can add more than that. It can help quantify the risks and outcomes, which are tremendously important with the billions or even trillions of dollars involved in financial work. Data science methods are well adapted to complex data, something that goes beyond the rows and columns of a spreadsheet, and allows you to bring in a whole range of sources in different formats. Also, it can model very complex interactions, nonlinear interactions that make it possible to do so much of the deep learning and the artificial intelligence work that makes the data magic possible. Data science can also automate repetitive processes and it allows organizations to offer a degree of personalization to their customers and clients that they couldn't do otherwise. Now there are standard sources of economic data. These include economic indicators like stock reports. There's also performance over time of various financial instruments. And there's also data about client behaviors. How often do they log in? How much do they save? How much do they put in different places? That data has always been used in financial work and it is still very valuable. But data science allows you to bring in some novel sources of data. Again, unstructured text, things that people say online, things that are in media reports. Or sequence data, looking at a very detailed level at changes over time and multiple things at once. Also, biometric markers for security. So people's fingerprints or their face scans as ways of identifying them to help reduce fraud. There can be network data, the connections between people and also the connections between organizations, especially as you're following the flow of cash. And then these diverse sources and inferences. So you can take some of this information and make estimates of for instance the probability of a particular customer adding or leaving your service or the probability of changes in the financial market over time. And you can use that as the data that you feed into your other models to try to predict both the market and behaviors. Now in terms of the decisions you have to make, you have to decide something like, for instance, what are the acceptable levels of risk for your models? You have to be willing to accept some risk or else you wouldn't be involved at all. You also want to look at the relative importance of false positives and false negatives. These aren't always symmetrical. Some models are much stronger on one than the other. If you have a categorization as your outcome, how important is it that you identify these small groups? So for instance, a potential fraud versus the cost of identifying things unnecessarily that go into that category. A lot of financial transactions, the data is coming in near real time. You may need to look at differences of microseconds. So decisions must be, in that case, completely automated, and algorithms must be optimized for performance. If you are on the other hand, dealing with human-made decisions, then you don't have that same level of pressure and you can use a model that allows more attention to what's going on that is then used to inform a human decision maker. And then there's the issue of transparency. This is most obvious when you're dealing with a data science model like deep learning or artificial intelligence, which are able to do amazing things. But really how the data goes from the beginning of the model to the end can be really hard to understand functionally impossible. And there are policies that affect this legal requirements. And so in certain situations, you can't use these particular models. And so you need to look specifically at the context that you're operating in. And how does that affect your choice of data science methods? Now speaking of methods of analyzing the data, you usually want to start at a very basic level. I say start with spreadsheets. A huge amount of very important work has been done in spreadsheets. If you saw the movies "The Big Short" or "Dumb Money," you saw that our central figures were using spreadsheets as their primary method of analyzing financial data. Regression models are very simple to create, are very flexible and strong, and often make predictions that are as good as much more complicated models. But regression models are easy to diagnose, they're easy to interpret, and they match the transparency required for a lot of regulatory situations. Methods for clustering data or methods for deep learning that allow you to get these nonlinear interactions and make more complex inferences. And then of course, data science over the last year and a half has strongly emphasized artificial intelligence and large language models for doing a wide range of tasks, that I'll talk about more in later video. But some of the most important things to consider are the risks and the rewards that are involved in using data science in financial applications. There are amazing things that you can accomplish as long as you are attentive to the strengths of each procedure and the particular challenges, and you accommodate those appropriately. They can give you a huge competitive advantage in the environment if you're an organization. Or as a consumer, it can help you make better decisions about your own personal finances.

Contents