```
# A tibble: 6 × 9
Age Sex ChestPain BP Cholesterol BloodSugar MaximumHR
<dbl> <fct> <fct> <dbl> <dbl> <lgl> <dbl>
1 63 Male Typical angina 145 233 TRUE 150
2 67 Male Asymptomatic 160 286 FALSE 108
3 67 Male Asymptomatic 120 229 FALSE 129
4 37 Male Non-anginal pain 130 250 FALSE 187
5 41 Female Atypical angina 130 204 FALSE 172
6 56 Male Atypical angina 120 236 FALSE 178
# ℹ 2 more variables: ExerciseInducedAngina <fct>, HeartDisease <fct>
```

## Overview

*Multiple testing*, what does it means?

Sometimes answering a research question means evaluating the different combinations of predictors, and this is the case when multiple hypothesis testing is made.

The following are the fundamental steps for understanding how to proceed when multiple hypothesis testing is needed.

Image credits: Hypothesis Testing On Linear Regression-medium.com

## Research questions

In general, hypothesis testing is made for comparing the expected values of two predictors which are the key drivers for explaining the trend of certain variables that are depending on the levels of those predictors.

Going into more detail, making a real-life example, we might need to evaluate the influence of gender and age on the increase of heart diseases. Under this specification, simple questions to be answered would be:

to be a female can be more or less risky in incurring heart disease?

or, what is the age on average at which heart diseases start increasing?

To answer these questions we need to make some assumptions:

there is no gender difference in heart diseases increasing age is not a cause of heart diseases

## Hypothesis testing

The first hypothesis is namely, the **null hypothesis** \(H_{0}\), and in general, it assumes that there would be no difference in mean (expected value) between the levels of the predictors. In case the null hypothesis fails, an alternative hypothesis is considered, such as there is a difference.

- Null hypothesis \(H_{0}\): the difference in mean equals zero
- Alternative hypothesis \(H_{a}\): the difference in mean is not zero

This is the starting point of making a hypothesis testing, then more variables can be tested, excluded, or kept whether there is a difference in the mean or not. And, so we are talking of multiple testing of hypothesis.

### Steps to hypothesis testing

- Define the null and alternative hypothesis
- Construct a test statistic against the null hypothesis
- Compute a p-value to quantify the probability to obtain a value that is the same or more extreme than the test statistic under the null hypothesis
- Decide whether to reject the null hypothesis

### Step back to modeling

Let’s hypothesize that we are investigating the root cause of heart diseases. To be simplistic we hypothesize that only gender and age are the key drivers for high blood pressure, the cause of the disease.

How would you make a model for answering this question?

As an example, we use the **heart_disease** dataset from the {cheese} package, and to begin our investigation we consider two predictors **Age** and **Sex**.

**Model function**: \(y=\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}\epsilon\)

**Hypothesis**:

- \(\beta_{i}\) equals to zero
- there is no difference between the mean blood pressure in female and male groups.

```
heart_disease%>%
ggplot(aes(y=HeartDisease,fill=factor(Sex)),color="white") +
geom_histogram(stat="count",position = "dodge")+
labs(fill="Sex")+
scale_fill_brewer()+
theme_test()
```

To be continued…