 ## Category: 101 Concepts for Level II

#### 26-11-19Essential Concept 1: Ethical Responsibilities Required by the Code and Standards

I. Professionalism A. Knowledge of the law Understand and comply with all applicable laws, rules and regulation. In a case of a conflict, comply with the stricter law. Do not knowingly participate in any violation. Disassociate from such activity. B. Independence and objectivity Use reasonable care and judgment. Maintain independence and objectivity. Do not offer, solicit or accept gifts; however, small token gifts are ok. C. Misrepresentation Do not omit important facts. Avoid plagiarism (the practice of taking someone else’s work or ideas and passing them off as one’s own). Do not guarantee investment performance. D. Misconduct Do not lie,… Read More

#### 26-11-19Essential Concept 2: Standard Error of Estimate, Coefficient of Determination, Confidence Interval for a Regression Coefficient

Standard error of estimate (SEE) measures how well a given linear regression model captures the relationship between the dependent and the independent variables.   The lower the SEE, the better the fit of the regression line. Also, note that the sum of the squared error terms can be written as SSE. Coefficient of determination (R2) measures the fraction of the total variation in the dependent variable that is explained by the independent variable.   Total variation = Unexplained variation + Explained variation     In a regression with one independent variable, R2 is the square of the correlation between the… Read More

#### 02-12-19Essential Concept 3: Analysis of Variance (ANOVA)

Analysis of variance is a statistical procedure for dividing the variability of a variable into components that can be attributed to different sources. We use ANOVA to determine the usefulness of the independent variable or variables in explaining variation in the dependent variable. ANOVA table Source of variation Degrees of freedom Sum of squares Mean sum of squares Regression (explained variation) k RSS  MSR=RSS/k Error (unexplained variation) n – 2 SSE  MSE=SSE/(n-k-1) Total variation n – 1 SST n represents the number of observations and k represents the number of independent variables. With one independent variable, k = 1. Hence,… Read More

#### 02-12-19Essential Concept 4: Confidence Interval of Regression Coefficient, Predicted Value of the Dependent Variable (Y)

The confidence interval for a regression coefficient is given by:     where:   is the critical t value for a given level of significance and degrees of freedom   is the standard error of the correlation coefficient To calculate the predicted value of the dependent variable Y we use a three-step process: Calculate estimates of the regression coefficients Assume values for the independent variables Use the regression equation:

#### 03-12-19Essential Concept 5: Problems in Regression Analysis

Problem  Effect  Solution Heteroskedasticity: variance of error term is not constant. Test using BP test BP = nR F-test is unreliable. Standard error underestimated. t-stat overstated. Robust standard errors Generalized least squares Serial correlation: error terms are correlated. Test using DW stat. DW 2(1 – r) F-stat too high. Standard error underestimated. t-stat overstated. Hansen method Modify the regression equation Multicollinearity: two or more independent variables are highly correlated.  Inflated SE’s; t-stats of coefficients artificially small High R Omit one or more of the independent variables

#### 03-12-19Essential Concept 6: Linear vs Log-Linear Trend Models

When the dependent variable changes at a constant amount with time, a linear trend model is used. The linear trend equation is given by   When the dependent variable changes at a constant rate (grows exponentially), a log-linear trend model is used. The log-liner trend equation is given by ln   A limitation of trend models is that by nature they tend to exhibit serial correlation in errors, due to which they are not useful. The Durban-Watson statistic is used to test for serial correlation. If this statistic differs significantly from 2, then we can conclude the presence of serial… Read More

#### 03-12-19Essential Concept 7: Autoregressive (AR) Models

An autoregressive time series model is a linear model that predicts its current value using its most recent past value as the independent variable. An AR model of order p, denoted by AR(p) uses p lags of a time series to predict its current value. The chain rule of forecasting is used to predict successive forecasts. The one-period ahead forecast of  from an AR(1) model is  can be used to forecast the two-period ahead value :

#### 03-12-19Essential Concept 8: Supervised Machine Learning Algorithms

In penalized regression the regression coefficients are chosen to minimize sum of squared residuals plus a penalty term that increases with the number of included variables. So, a feature must make a sufficient contribution to the model fit to offset the penalty from including it. Because of this penalty, the model remains parsimonious and only the most important variables for explaining Y remain in the model. A popular type of penalized regression is LASSO. Support vector machine (SVM) is a linear classifier that aims to seek the optimal hyperplane – the one that separates the two sets of data points… Read More

#### 03-12-19Essential Concept 9: Unsupervised Machine Learning Algorithms

Principal components analysis (PCA) is used to reduce highly correlated features into a few uncorrelated composite variables. A composite variable is a variable that combines two or more variables that are statistically strongly related to each other. K-means algorithm repeatedly partitions observations into k non-overlapping clusters. The number of clusters k, is a hyperparameter whose value must be set by the researcher before learning begins. Each cluster is characterized by its centroid and each observation is assigned to the cluster with the centroid to which that observation is closest. Hierarchical clustering algorithms create intermediate rounds of clusters in increasing or… Read More

#### 03-12-19Essential Concept 10: Data Prep & Wrangling

Structured data For structured data, data preparation and wrangling involve data cleansing and data preprocessing. Data cleansing involves resolving: Incompleteness errors: Data is missing. Invalidity errors: Data is outside a meaningful range. Inaccuracy errors: Data is not a measure of true value. Inconsistency errors: Data conflicts with the corresponding data points or reality. Non-uniformity errors: Data is not present in an identical format. Duplication errors: Duplicate observations are present. Data preprocessing involves performing the following transformations: Extraction: A new variable is extracted from the current variable for ease of analyzing and using for training the ML model. Aggregation: Two or… Read More