fbpixelFintech in Investment Management | IFT World
IFT Notes for Level I CFA® Program

R55 Fintech in Investment Management

Part 1


1. Introduction and What is Fintech

This reading is divided into seven main sections. Section 2 covers ‘What is Fintech?’ Sections 3 and 4 cover ‘Big data’, ‘artificial intelligence’, and ‘machine learning’. Section 5 covers data science. Section 6 covers applications of fintech to investment management. Finally, section 7 covers distributed ledger technology.

1.1 What is Fintech?

The term ‘Fintech’ comes from combining ‘Finance’ and ‘Technology’. Fintech refers to technological innovation in the design and delivery of financial products and services.

Though the term ‘Fintech’ is relatively new, its earlier forms involved data processing and automation. Fintech’s recent advancement include developing several decision-making applications.

The major drivers of fintech have been:

  • Rapid growth in data
  • Technological advances

While Fintech spans the entire finance space, this reading focuses on fintech applications in the investment management industry. The major applications are:

  • Analysis of large datasets
  • Analytical tools
  • Automated trading
  • Automated advice
  • Financial record keeping

2. Big Data

Big Data refers to vast amount of data generated by industry, governments, individuals, and electronic devices. Characteristics of big data typically include:

  • Volume: Over the last few decades, the amount of data that we are dealing with has grown exponentially.
  • Velocity: In the past we often worked with batch processing, however we are now increasingly working with real time data.
  • Variety: Historically we only dealt with structured data. However, we are now also dealing with unstructured data such as text, audio, video, etc.

Concept 31

2.1 Sources of Big Data

Traditional data sources include annual reports, regulatory filings, trade price and volume, etc. Alternate data include many other sources and types of data. A simple classification of alternate data sources is shown in Exhibit 2 of the curriculum.

IndividualsBusiness ProcessesSensors
Social mediaTransaction dataSatellites
News, reviewsCorporate dataGeolocation
Web searches, personal dataInternet of Things
Other sensors

2.2 Big Data Challenges

While big data can be a huge asset, there are also several challenges. The quality of data may be questionable. The data may have biases, outliers, etc.  The volume of data collected may not be sufficient. We might be dealing with too much data or too little data. Another concern is the appropriateness of data. In most cases working with Big Data usually involves cleansing and organizing the data before we start analyzing it.

3. Advanced Analytical Tools: Artificial Intelligence and Machine Learning

Artificial intelligence (AI) computer systems perform tasks that have traditionally required human intelligence. They exhibit cognitive and decision-making ability comparable or superior to that of human beings. An important term in this context is ‘neural networks’. It refers to programming based on how the brain learns and processes information. There are examples of AI all around us. For example, chess playing computer programs, digital assistants like Apple’s Siri, etc.

Machine learning (ML) refers to computer-based techniques that “extract knowledge from large amounts of data by “learning” from known examples and then generating structure or predictions” without relying on any help from a human. ML algorithms aim to “find the pattern, apply the pattern.”

In ML, the dataset is divided into three distinct subsets:

  1. i. Training dataset: It allows the algorithm to identify relationships between inputs and outputs based on historical patterns in the data.
  2. ii. Validation dataset: It is used to validate and model tune the relationships identified by training dataset.
  • iii. Test dataset: As the name implies, this dataset is used to test the model’s ability to predict well on new data.

Once an algorithm has mastered the training and validation datasets, it can be used to predict outcomes based on other datasets.

Broadly speaking there are three main approaches to machine learning:

  1. Supervised learning: In supervised learning, both inputs and outputs are identified or labeled. After learning from labeled data, the trained algorithm is used to predict outcomes for new data sets.
  2. Unsupervised learning: In unsupervised learning, the input and output variables are not labeled. Here we want the ML algorithm to seek relationships on its own.

Deep learning: In deep learning, (or deep learning nets), neural networks are used by the computers to perform multistage, non-linear data processing to identify patterns. Deep learning can use supervised or unsupervised machine learning approaches. With terms like AI and ML one might think that human judgment is not required, but that is far from the truth. For ML to work well, good human judgment is required. Human judgment is required for questions like: which data to use, how much data to use, which analytical techniques are relevant in the given context. Human judgment may also be needed to clean and filter the data before it is fed to the ML algorithm.

Some challenges associated with machine learning are:

  • Over-fitting the data: Sometimes an algorithm may try to be too precise in the way it interprets data and predicts outcomes. This leads to over-trained models and may result in data mining bias. We try to mitigate this issue by having a good validation dataset.
  • Black box: ML techniques can be opaque or black box, which means we have predictions that are not very easy to understand or to explain.

Despite these challenges and weaknesses, the importance of ML in finance and investment management has been growing substantially. In the next few sections, we will look at specific applications of AI and ML in the context of investment management.

4. Data Science: Extracting Information from Big Data

Data science leverages advances in computer science, statistics, and other disciplines for the purpose of extracting information from Big Data.

4.1 Data Processing Methods

Data processing methods include:

  • Capture: Refers to how data is collected from various sources and transformed into a format that can be used by the analytical process.
  • Curation: Refers to the process of ensuring data quality and accuracy through data cleaning.
  • Storage: Refers to how data will be recorded, archived, and accessed. It also refers to the underlying databases design. An important consideration here is whether the data is structured, unstructured, or both. We also need to be concerned whether the analytical tools need real time access to the data or not.
  • Search: Refers to how we can find what we want from the vast amount of data.
  • Transfer: Refers to how data will move from the underlying source to the analytical tools that are being used.

4.2 Data Visualization

Another aspect of data science is data visualization. This refers to how the data will ultimately be presented to the analyst/user. Historically, data visualization happened through graphs, charts, etc. However, in more recent times tools such as heat maps, tree diagrams, and tag clouds are also being used.

An example of a heat map is a map of a city where routes with high traffic congestion are shown in red.  A tag cloud is a technique applicable to textual data. Words that appear more often are shown in a larger font, whereas words that appear less often are shown with a smaller font. This helps us to quickly evaluate how consumers/users are talking about a given product.