IFT Notes for Level I CFA^{®} Program

A **tree-map** is a graphical tool to display categorical data. It comprises of a set of colored rectangles to represent distinct groups. The area of each rectangle is proportional to the value of the corresponding group. Additional dimensions of categorical data can be displayed by a set of nested rectangles.

A **word cloud** (also known as a **tag cloud**) is a visual device for representing textual data. The size of each word is proportional to the frequency of the word in the given text.

A sample word cloud is show below.

Sometimes color can be used to add another dimension. For example, for a word cloud based on analyst reports related to a particular company, different colors can be used for positive, negative and neutral sentiment words. Positive sentiment can be depicted by the color ‘green’. Negative sentiment can be depicted by the color ‘red’ and neutral sentiment can be depicted by the color ‘blue’.

A **line chart** is a type of graph used to visualize ordered observations. It is often used to display the change of data series over time. A line chart can plot more than one set of data points, which helps in making comparisons. A sample line chart is shown below. After the 2008 crisis, stock prices dropped and unemployment rose.

A **bubble line chart** is a special type of line chart that uses varying-sized bubbles as data points to represent an additional dimension of data.

The following chart plots the quarterly revenue and EPS for a company over a two-year period. The x-axis represents time and the y-axis represents revenue. The line represents revenue. Each revenue data point is replaced by a circular bubble representing the EPS in the corresponding quarter. The size of the bubbles are proportional to the magnitude of the EPS. The bubbles are also color coded – red represents losses and green represents profits.

A scatter plot is a type of graph used to visualize the joint variation in two numerical variables. It is constructed with the x-axis representing one variable and the y-axis representing the other variable. Dots are drawn to indicate the values of the two variables at different points in time.

The pattern of a scatter plot may indicate no relationship, linear relationship or a non-linear relationship between the two variables. In case of a linear relationship, a positive slope indicates that the variables move in the same direction; whereas a negative slope indicates that the variables move in opposite directions.

A **scatter plot matrix** organizes scatter plots between pairs of variables into a matrix format. This makes it easy to inspect all pairwise relationships in one combined visual.

A heat map is a type of graphic that organizes and summarizes data in a tabular format and represents it using a color spectrum.

Heat maps are often used in displaying frequency distributions or visualizing the degree of correlation among different variables.

A sample heat map for a portfolio is shown below. Cells in the chart are color coded to differentiate high values from low values. Blue represents lower values whereas orange represents higher values.

The intended purpose of visualizing data (i.e., whether it is for exploring/presenting distributions or relationships or for making comparisons) is the main factor that helps select the appropriate chart type.

To explore/present relationships between variable we can use the following visualization types:

- Two variables: Scatter plot
- More than two variables: Scatter plot matrix, heat map.

To explore/present distributions we can use the following visualization types:

- Numerical data: Histogram, frequency polygon, cumulative distribution chart.
- Categorical data: Bar chart, tree map, heat map
- Unstructured data: Word cloud.

To make comparisons we can use the following visualization types:

- Comparison among categories: Bar chart, tree map, heat map
- Comparison over time: Line chart for two variables, bubble line chart for three variables.

**Common pitfalls**

Four common pitfalls that should be avoided are:

__Improper chart type__: To examine the correlation between two variables, a scatter plot should be used. If a line chart is used instead, then it will be difficult to examine the correlation.__Selectively plotted data__: Selecting an overly short time period may show the presence of a trend that is actually noise. For example, over a time period of a few days, it may appear that a stock is in a down trend, but when we consider a time period of the last two years, it is clear that the stock is a general uptrend with few days of consolidation in between.__Improperly plotting data in a truncated graph__: For example, suppose a vertical bar chart is used to compare EPS of two companies A and B. A has an EPS of $14 and B has an EPS of $15. If the y-axis starts at $13, then the bar heights would inaccurately imply that B’s EPS is twice that of A.__Improper scaling of axes__: For example, consider a line chart of EPS of a company over time. The EPS is generally in the $10 – $20 range. If we set the Y-axis to plot numbers up to $100, then the graph will be compressed and will appear to be less steep and less volatile than if we set the Y-axis to plot numbers only up to $25.