# Formulary

## Statistical Distributions

### Normal (Gaussian) Distribution

**Formula**: f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}**Description**: The normal distribution is a continuous probability distribution characterized by a bell-shaped curve. It is defined by the mean (\mu) and standard deviation (\sigma).

### Binomial Distribution

**Formula**: P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}**Description**: The binomial distribution represents the number of successes in a fixed number of independent Bernoulli trials, with a constant probability of success p in each trial. Here, n is the number of trials and k is the number of successes.

### Poisson Distribution

**Formula**: P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}**Description**: The Poisson distribution represents the probability of a given number of events occurring in a fixed interval of time or space, given the average number of times the event occurs over that interval. Here, \lambda is the average number of events, k is the number of occurrences, and e is Euler’s number.

### Exponential Distribution

**Formula**: f(x) = \lambda e^{-\lambda x} \quad \text{for } x \ge 0**Description**: The exponential distribution represents the time between events in a Poisson process. It is defined by the rate parameter \lambda.

### Uniform Distribution

**Formula**: f(x) = \begin{cases} \frac{1}{b - a} & a \le x \le b \\ 0 & \text{otherwise} \end{cases}**Description**: The uniform distribution describes an equal probability for all values in the interval [a, b]. It is a continuous distribution.

### Bernoulli Distribution

**Formula**: P(X = x) = p^x (1 - p)^{1-x} \quad \text{for } x \in \{0, 1\}**Description**: The Bernoulli distribution is a discrete distribution representing the outcome of a single binary experiment with success probability p.

### Beta Distribution

**Formula**: f(x) = \frac{x^{\alpha-1} (1-x)^{\beta-1}}{B(\alpha, \beta)} \quad \text{for } 0 \le x \le 1**Description**: The beta distribution is a continuous distribution defined on the interval [0, 1], parameterized by \alpha and \beta, and is useful in Bayesian statistics.

### Gamma Distribution

**Formula**: f(x) = \frac{\beta^\alpha x^{\alpha-1} e^{-\beta x}}{\Gamma(\alpha)} \quad \text{for } x \ge 0**Description**: The gamma distribution is a continuous distribution defined by shape parameter \alpha and rate parameter \beta. It generalizes the exponential distribution.

### Chi-Squared Distribution

**Formula**: f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{k/2 - 1} e^{-x/2} \quad \text{for } x \ge 0**Description**: The chi-squared distribution is a special case of the gamma distribution with \alpha = k/2 and \beta = 1/2, often used in hypothesis testing and confidence intervals.

### Student’s t-Distribution

**Formula**: f(t) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi} \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}}**Description**: The t-distribution is used to estimate population parameters when the sample size is small and the population variance is unknown. It is defined by the degrees of freedom \nu.

### F-Distribution

**Formula**: f(x) = \frac{\left(\frac{d_1}{d_2}\right)^{d_1/2} x^{d_1/2 - 1}}{B\left(\frac{d_1}{2}, \frac{d_2}{2}\right) \left(1 + \frac{d_1}{d_2} x\right)^{(d_1 + d_2)/2}}**Description**: The F-distribution is used to compare two variances and is defined by two degrees of freedom, d_1 and d_2.

### Multinomial Distribution

**Formula**: P(X_1 = x_1, \ldots, X_k = x_k) = \frac{n!}{x_1! \cdots x_k!} p_1^{x_1} \cdots p_k^{x_k}**Description**: The multinomial distribution generalizes the binomial distribution to more than two outcomes. It describes the probabilities of counts among categories.

### Geometric Distribution

**Formula**: P(X = k) = (1 - p)^{k-1} p \quad \text{for } k \in \{1, 2, 3, \ldots\}**Description**: The geometric distribution represents the number of trials needed to get the first success in a sequence of independent Bernoulli trials with success probability p.

### Hypergeometric Distribution

**Formula**: P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}**Description**: The hypergeometric distribution describes the probability of k successes in n draws from a finite population of size N containing K successes, without replacement.

### Log-Normal Distribution

**Formula**: f(x) = \frac{1}{x\sigma\sqrt{2\pi}} e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}} \quad \text{for } x > 0**Description**: The log-normal distribution describes a variable whose logarithm is normally distributed. It is useful in modeling positively skewed data.

## Machine Learning Models

### Linear Regression

**Formula**: y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n + \epsilon**Description**: Predicts a continuous target variable based on linear relationships between the target and one or more predictor variables.

### Logistic Regression

**Formula**: \text{logit}(P(Y=1)) = \ln\left(\frac{P(Y=1)}{1 - P(Y=1)}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n**Description**: Predicts a binary outcome based on linear relationships between the predictor variables and the log-odds of the outcome.

### Generalized Linear Model (GLM)

**Formula**: g(E(Y)) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n**Description**: A generalized linear model is a flexible generalization of ordinary linear regression that allows for the dependent variable Y to have a distribution other than normal. The link function g relates the expected value of the response variable E(Y) to the linear predictors. \beta_0 is the intercept, and \beta_i are the coefficients for the predictor variables x_i.

### Generalized Additive Model (GAM)

**Formula**: g(E(Y)) = \beta_0 + f_1(x_1) + f_2(x_2) + \cdots + f_n(x_n)**Description**: A generalized additive model is an extension of generalized linear models where the linear predictor depends linearly on unknown smooth functions of some predictor variables, and it allows for non-linear relationships between the dependent and independent variables. Here, g is the link function, E(Y) is the expected value of the response variable Y, \beta_0 is the intercept

### Decision Tree

**Formula**: Recursive binary splitting**Description**: Splits the data into subsets based on the value of input features. Each internal node represents a “test” on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label or continuous value.

### Random Forest

**Formula**: Aggregated decision trees**Description**: Combines the predictions of multiple decision trees to improve accuracy and control over-fitting. Each tree is trained on a bootstrapped sample of the data and uses a random subset of features.

### Support Vector Machine (SVM)

**Formula**: f(x) = \text{sign}(\mathbf{w} \cdot \mathbf{x} + b)**Description**: Finds the hyperplane that best separates the classes in the feature space. The formula represents the decision boundary, where \mathbf{w} is the weight vector and b is the bias.

### K-Nearest Neighbors (KNN)

**Formula**: \hat{y} = \frac{1}{k} \sum_{i=1}^{k} y_i**Description**: Classifies a data point based on the majority class among its k nearest neighbors. For regression, it predicts the average of the k nearest neighbors’ values.

### Naive Bayes

**Formula**: P(Y|X) = \frac{P(X|Y)P(Y)}{P(X)}**Description**: Assumes independence between predictors. It uses Bayes’ theorem to predict the probability of a class given the predictors.

### Principal Component Analysis (PCA)

**Formula**: Z = XW**Description**: Reduces the dimensionality of the data by transforming the original variables into new uncorrelated variables (principal components), ordered by the amount of variance they capture.

### K-Means Clustering

**Formula**: \arg \min_S \sum_{i=1}^{k} \sum_{x \in S_i} \| x - \mu_i \|^2**Description**: Partitions the data into k clusters by minimizing the sum of squared distances between the data points and the cluster centroids \mu_i.

### Neural Networks

**Formula**: a^{(l)} = \sigma(z^{(l)}) z^{(l)} = W^{(l)}a^{(l-1)} + b^{(l)}**Description**: Composed of layers of interconnected nodes (neurons). Each neuron’s output is a weighted sum of its inputs passed through an activation function \sigma. The parameters W^{(l)} and b^{(l)} are the weights and biases of layer l.

### Convolutional Neural Networks (CNN)

**Formula**: (f * g)(t) = \int_{-\infty}^{\infty} f(\tau)g(t - \tau) \, d\tau**Description**: Uses convolutional layers to apply filters to the input, which helps in capturing spatial hierarchies in data, particularly useful for image and video processing.

### Recurrent Neural Networks (RNN)

**Formula**: h_t = \sigma(W_h h_{t-1} + W_x x_t + b)**Description**: Designed to recognize patterns in sequences of data by maintaining a hidden state h_t that captures information from previous time steps.

### Gradient Boosting Machines (GBM)

**Formula**: F_m(x) = F_{m-1}(x) + \eta \cdot h_m(x)**Description**: Builds an additive model in a forward stage-wise manner. Each base learner h_m is trained to reduce the residual error of the ensemble’s previous predictions.

### Long Short-Term Memory Networks (LSTM)

**Formula**: \begin{aligned} f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\ i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\ \tilde{C}_t &= \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) \\ C_t &= f_t * C_{t-1} + i_t * \tilde{C}_t \\ o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\ h_t &= o_t * \tanh(C_t) \end{aligned}**Description**: A type of RNN that can learn long-term dependencies by using gates to control the flow of information.