Description: The normal distribution is a continuous probability distribution characterized by a bell-shaped curve. It is defined by the mean (\mu) and standard deviation (\sigma).
Binomial Distribution
Formula:
P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}
Description: The binomial distribution represents the number of successes in a fixed number of independent Bernoulli trials, with a constant probability of success p in each trial. Here, n is the number of trials and k is the number of successes.
Description: The Poisson distribution represents the probability of a given number of events occurring in a fixed interval of time or space, given the average number of times the event occurs over that interval. Here, \lambda is the average number of events, k is the number of occurrences, and e is Euler’s number.
Description: The beta distribution is a continuous distribution defined on the interval [0, 1], parameterized by \alpha and \beta, and is useful in Bayesian statistics.
Description: The gamma distribution is a continuous distribution defined by shape parameter \alpha and rate parameter \beta. It generalizes the exponential distribution.
Description: The chi-squared distribution is a special case of the gamma distribution with \alpha = k/2 and \beta = 1/2, often used in hypothesis testing and confidence intervals.
Description: The t-distribution is used to estimate population parameters when the sample size is small and the population variance is unknown. It is defined by the degrees of freedom \nu.
Description: The multinomial distribution generalizes the binomial distribution to more than two outcomes. It describes the probabilities of counts among categories.
Geometric Distribution
Formula:
P(X = k) = (1 - p)^{k-1} p \quad \text{for } k \in \{1, 2, 3, \ldots\}
Description: The geometric distribution represents the number of trials needed to get the first success in a sequence of independent Bernoulli trials with success probability p.
Description: The hypergeometric distribution describes the probability of k successes in n draws from a finite population of size N containing K successes, without replacement.
Log-Normal Distribution
Formula:
f(x) = \frac{1}{x\sigma\sqrt{2\pi}} e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}} \quad \text{for } x > 0
Description: The log-normal distribution describes a variable whose logarithm is normally distributed. It is useful in modeling positively skewed data.
Description: A generalized linear model is a flexible generalization of ordinary linear regression that allows for the dependent variable Y to have a distribution other than normal. The link function g relates the expected value of the response variable E(Y) to the linear predictors. \beta_0 is the intercept, and \beta_i are the coefficients for the predictor variables x_i.
Description: A generalized additive model is an extension of generalized linear models where the linear predictor depends linearly on unknown smooth functions of some predictor variables, and it allows for non-linear relationships between the dependent and independent variables. Here, g is the link function, E(Y) is the expected value of the response variable Y, \beta_0 is the intercept
Decision Tree
Formula: Recursive binary splitting
Description: Splits the data into subsets based on the value of input features. Each internal node represents a “test” on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label or continuous value.
Random Forest
Formula: Aggregated decision trees
Description: Combines the predictions of multiple decision trees to improve accuracy and control over-fitting. Each tree is trained on a bootstrapped sample of the data and uses a random subset of features.
Support Vector Machine (SVM)
Formula:
f(x) = \text{sign}(\mathbf{w} \cdot \mathbf{x} + b)
Description: Finds the hyperplane that best separates the classes in the feature space. The formula represents the decision boundary, where \mathbf{w} is the weight vector and b is the bias.
K-Nearest Neighbors (KNN)
Formula:
\hat{y} = \frac{1}{k} \sum_{i=1}^{k} y_i
Description: Classifies a data point based on the majority class among its k nearest neighbors. For regression, it predicts the average of the k nearest neighbors’ values.
Naive Bayes
Formula:
P(Y|X) = \frac{P(X|Y)P(Y)}{P(X)}
Description: Assumes independence between predictors. It uses Bayes’ theorem to predict the probability of a class given the predictors.
Principal Component Analysis (PCA)
Formula:
Z = XW
Description: Reduces the dimensionality of the data by transforming the original variables into new uncorrelated variables (principal components), ordered by the amount of variance they capture.
Description: Composed of layers of interconnected nodes (neurons). Each neuron’s output is a weighted sum of its inputs passed through an activation function \sigma. The parameters W^{(l)} and b^{(l)} are the weights and biases of layer l.
Description: Uses convolutional layers to apply filters to the input, which helps in capturing spatial hierarchies in data, particularly useful for image and video processing.
Recurrent Neural Networks (RNN)
Formula:
h_t = \sigma(W_h h_{t-1} + W_x x_t + b)
Description: Designed to recognize patterns in sequences of data by maintaining a hidden state h_t that captures information from previous time steps.
Gradient Boosting Machines (GBM)
Formula:
F_m(x) = F_{m-1}(x) + \eta \cdot h_m(x)
Description: Builds an additive model in a forward stage-wise manner. Each base learner h_m is trained to reduce the residual error of the ensemble’s previous predictions.