A quick note on predictive modeling process
In predictive modeling, we mainly focus on regression and classification problems. The regression problem predicts the value of a continuous dependent variable using one or several independent variables, known as predictors. On the other hand, the classification problem focuses on predicting a categorical target variable using several continuous or categorical predictors. Several statistical and machine learning methods are available in the literature to work on these problems. Some methods assume a specific functional relationship between the predictors and target variable before modeling, known as parametric methods. In contrast, other methods do not assume the nature of a functional relationship between the predictors and target variable beforehand and let the data learn the relationship between them, known as nonparametric methods.
When implementing any class of methods to the data, one is usually focused on building a model that achieves the optimal performance scores on the unseen data. For instance, maximizing the coefficient of determination or minimizing the root mean squared error on the test data for the regression modeling. Similarly, for classification modeling, one generally maximizes the accuracy, sensitivity, or specificity scores on the test data. For unbalanced data, what metrics you seek to optimize depends on the overall goal to achieve. In general, the goal is to make sure that the developed model generalizes well on the unseen data. In fact, training these machine learning models narrow down to the problem of finding the best value of the parameters. Hence, there is an optimization procedure in the model building process, no matter what techniques you use.
For instance, you may want to minimize the sum of squared error if you fit a least-squares regression. Likewise, you minimize the misclassification error, Gini impurity, or entropy if you perform classification analysis using tree-based models. While working with these problems, one is always interested in achieving global optimality. Roughly speaking, the global optimum is the optimal value the function can take among all possible values in the domain, whereas the local optimum is the optimal value the function can take in the neighborhood. The global optimum is guaranteed when we work with a convex function, with constraints being a convex set. By a convex region, we mean a region that contains the line segment joining any two points in that set. This class of optimization problems is known as convex optimization problems. However, there are other classes of optimization problems where the global optimum cannot be guaranteed. Therefore, we need to pay attention to possible nonconvex problems while working with predictive models.
In addition, iterative methods have been implemented in the library to implement these machine learning methods. As a result, many iterations are needed to minimize the loss functions to obtain the best parameter values. Since there may be choices for solvers available, having a basic knowledge of solvers and their limitations can be helpful. In addition, some methods converge faster than others; for example, the first-order methods tend to be slower than the second-order methods. Changing the solver or increasing the number of iterations can help fix those convergence issues. Finally, although computation time may not be an issue while working with small data, the time difference can be significant while performing grid search cross-validation with a larger data set.
Predictive modeling is a vast area; much care needs to be taken to have an effective ready-to-go model. We may talk more about a specific case in another post.