Chapter 12 wave form data visualization example for public health
12.0.1 Data Visualization example (EEG data)
In public health, another most common used data is bio-signal data. Bio-signal data usually used in medical research, but recently bio-log signals are widely used in public health beyond medical setting. Now I present some example of data visualization using bio-signal data from EEG.
12.0.2 Introduction
EEG refers to the signal of the brain’s electrical activty. Electrodes are placed on the scalp, and each electrodes recorded brain’s activity. EEG is one of the most common methods to support diagnosis of brain diseases such as epilepsy, sleep disorder and even brain death. Furthermore, we can get usufull understand how brain activity correlated to various neurological activity. So, I tried to anaysis EEG signal and hope to predict eye opening status.
The data include information of row_n
, AF3
, F7
, F3
, FC5
, T7
,P7
, O1
, O2
, P8
, T8
, FC6
, F4
, F8
, AF4
, and eyeDetection
. Eye Detection is outcome variable data Open
or Closed
. Others numeric variable about activity of each electrodes. Each electro-nodes place on scalp, and represent area of particular location on brain, as below.
rm(list=ls())
<- "https://upload.wikimedia.org/wikipedia/commons/thumb/7/70/21_electrodes_of_International_10-20_system_for_EEG.svg/1024px-21_electrodes_of_International_10-20_system_for_EEG.svg.png"
url download.file(url, 'images/eegpng.png')
12.0.3 Dataset and Data step
12.0.3.1 Data download and handling
The data set locate here, https://archive.ics.uci.edu/ml/machine-learning-databases/00264/EEG%20Eye%20State.arff
.
The data was stored into my computer, as name of ‘dl_eeg.txt’ in `data’ folder.
='https://archive.ics.uci.edu/ml/machine-learning-databases/00264/EEG%20Eye%20State.arff'
url download.file(url, 'data/dl_eeg.txt')
To start the data step, some packages should be loaded.
if(!require("tidyverse")) install.packages("tidyverse")
if(!require("htmlTable")) install.packages("htmlTable")
if(!require("broom")) install.packages("broom")
if(!require("ggthemes")) install.packages("ggthemes")
# packages from github
if(!require("devtools")) install.packages("devtools")
library(devtools)
install_github("jinhaslab/tabf", force = TRUE)
## * checking for file ‘/tmp/Rtmp34RMib/remotes1625d3f6760ec/jinhaslab-tabf-e7da070/DESCRIPTION’ ... OK
## * preparing ‘tabf’:
## * checking DESCRIPTION meta-information ... OK
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
## * building ‘tabf_0.0.0.9000.tar.gz’
library(tabf)
#library(caret)
library(knitr)
library(kableExtra)
#library(doMC)
scan the data and create DB for analysis
<- scan('data/dl_eeg.txt', what="", sep="")
mm which(mm == '@DATA')
## [1] 48
<-mm[-c(1:48)]
dat<- mm[1:48]
book -c(1:2, 48)] %>%
book[matrix(., ncol=3, byrow=TRUE) %>%
2] -> col_names
.[, col_names
## [1] "AF3" "F7" "F3" "FC5" "T7"
## [6] "P7" "O1" "O2" "P8" "T8"
## [11] "FC6" "F4" "F8" "AF4" "eyeDetection"
the tibble form is easy to hand or transforming. So, I change the data form to tibble style.
tibble(wave =dat) %>%
mutate (val = strsplit(wave, ","), row_n=row_number()) %>%
unnest (cols=c(val)) %>%
select (-wave) %>%
mutate (val = as.numeric(val)) %>%
group_by(row_n) %>%
mutate (wave_colname = col_names) %>%
ungroup() %>%
pivot_wider(names_from = wave_colname, values_from = val)%>%
select (row_n, all_of(col_names)) %>%
arrange (row_n) ->
eeg
I check the class of all variable.
names(eeg)
## [1] "row_n" "AF3" "F7" "F3" "FC5"
## [6] "T7" "P7" "O1" "O2" "P8"
## [11] "T8" "FC6" "F4" "F8" "AF4"
## [16] "eyeDetection"
sapply(eeg, mode)
## row_n AF3 F7 F3 FC5 T7
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## P7 O1 O2 P8 T8 FC6
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## F4 F8 AF4 eyeDetection
## "numeric" "numeric" "numeric" "numeric"
This data were measure for 117 seconds, I create sec
variable to represent measurement time. I create facotor variable for eye opening status `Eye’, as below.
<- eeg %>%
eeg mutate(Eye = ifelse(eyeDetection ==1, 'open', 'closed')) %>%
mutate(sec = seq(0, 117, length.out = nrow(.)))
For data step, the long form data were created via pivot_longer
function. the key represent each electrode, and the activity were stored into activity
variable. EyeDetection, row_n, sec, Eye will be indentical variable for each electrode activity.
%>%
eeg pivot_longer(-c(eyeDetection, row_n, sec, Eye),
names_to = 'electrode',
values_to= 'activity') ->
eegl%>% head() eegl
## # A tibble: 6 × 6
## row_n eyeDetection Eye sec electrode activity
## <int> <dbl> <chr> <dbl> <chr> <dbl>
## 1 1 0 closed 0 AF3 4329.
## 2 1 0 closed 0 F7 4009.
## 3 1 0 closed 0 F3 4289.
## 4 1 0 closed 0 FC5 4148.
## 5 1 0 closed 0 T7 4350.
## 6 1 0 closed 0 P7 4586.
12.0.3.2 data explorer
12.0.3.2.1 heatmap
The heatmap can show overview the time course about eye opening status. Dark blue indicate closed status, and light blue indicate eye open status.
%>%
eegl ggplot(aes(x = sec, y = 0.5, fill = eyeDetection)) +
geom_tile() +
scale_fill_continuous(name = "Eye status",
limits = c(0, 1),
labels = c('closed','open'),
breaks = c(0, 1)) +
theme_classic() +
scale_y_continuous(breaks = c(0, 1), 'Eye status') +
scale_x_continuous(breaks = c(0, 30, 60, 90, 117)) +
theme(aspect.ratio=1/7)
##### waveform
The activity waves are plotting for data explorer.
%>%
eegl ggplot(aes(x = row_n, y = activity, color = Eye)) +
geom_line(size = c(1)) +
geom_point(alpha = 1)+
facet_grid(electrode ~. ) +
theme_classic()
There are much of outlier, so the fluctuation of each node’s activity atteunted. So We needed to exclude outlier. To ensure the outlier status, I plot the boxplot, as below. The boxplot also suggested that there were much of outlier effect.
%>%
eegl ggplot(aes(x = electrode, y = activity, color = Eye)) +
geom_boxplot() +
scale_y_continuous(trans='log2') +
theme_classic()
Althoug there are outlier effect on data, but I just want to check the t-test
results.
<- eegl %>% select(-eyeDetection, -row_n) %>%
ttestgroup_by(electrode) %>%
nest() %>%
mutate(stats = map(data, ~t.test(activity ~ Eye, data=.x)),
summarise = map(stats, broom::glance)) %>%
select(electrode, summarise) %>%
unnest(summarise)
%>%
ttest mutate_if(is.numeric, round, 2) %>%
mutate(p.values = ifelse(p.value <0.01, '<0.01', as.character(p.value))) %>%
select(electrode,`difference` = estimate, 'open' = estimate1, 'closed' = estimate2, p.values) %>%
htmlTable(
cgroup=c("", "Eye status", ""),
n.cgroup=c(2, 2, 1),
caption = "Table1. t-test result without outlier cleaning",
align = 'llrc'
)
Table1. t-test result without outlier cleaning | |||||||
Eye status | |||||||
---|---|---|---|---|---|---|---|
electrode | difference | open | closed | p.values | |||
1 | AF3 | -52.4 | 4298.4 | 4350.8 | 0.25 | ||
2 | F7 | 7.39 | 4013.08 | 4005.69 | <0.01 | ||
3 | F3 | -3.47 | 4262.46 | 4265.94 | <0.01 | ||
4 | FC5 | 78.98 | 4200.39 | 4121.41 | 0.31 | ||
5 | T7 | 0.03 | 4341.75 | 4341.73 | 0.96 | ||
6 | P7 | 46.13 | 4664.73 | 4618.59 | 0.29 | ||
7 | O1 | 66.82 | 4140.39 | 4073.57 | 0.33 | ||
8 | O2 | -1.48 | 4615.39 | 4616.87 | <0.01 | ||
9 | P8 | -41.13 | 4200.37 | 4241.5 | 0.29 | ||
10 | T8 | -3.61 | 4229.7 | 4233.31 | <0.01 | ||
11 | FC6 | -4.88 | 4200.26 | 4205.15 | <0.01 | ||
12 | F4 | -4.01 | 4277.43 | 4281.44 | <0.01 | ||
13 | F8 | -31.88 | 4600.9 | 4632.77 | 0.15 | ||
14 | AF4 | 89.42 | 4456.57 | 4367.15 | 0.31 |
12.0.3.3 data cleaning for outlier
The visualization of interquartile range is useful for detecting the presenting of outliers. Outliers are individual values that fall outside of the overall pattern of a data set. So, the outlier can be define when they are far way as several times of interquarile range from median. I made scale_median
function to detect outlier. there are several step to make function. first, calculate the interquartile range. The differece between median and value divided by inter quartile range, and the absolute value will be Z-score. That z-score represent that how many times of interquartile range far from median. I used score of 2.5 times par form median for detect outlier.
<- function(x){
scale_median <- quantile(x, 0.25)
Q1 <- quantile(x, 0.75)
Q3 <- c(Q3 - Q1)
iqr ifelse(x < median(x),(x - Q1)/iqr*-1,(x - Q3)/iqr*1 )
}<-
eegl1%>%
eegl group_by(electrode) %>%
nest() %>%
mutate(Zscores = map (data, ~scale_median(.$activity))) %>%
unnest(cols = c(data, Zscores)) %>%
filter(Zscores < 2.5) #Outlier is Zscores > 2.5 or <-2.5
I check box-plot without outlier. That plot sounds good for me. So, I can generate wave form plot.
%>%
eegl1 ggplot(aes(x = electrode, y = activity, color = Eye)) +
geom_boxplot()
Here, I plot wave form of each electrodes’ activity. Before generate the wave plot, I want increase the working cores.
library('doMC')
library(parallel)
getDoParWorkers()
## [1] 1
<- detectCores() number_of_cores
My computer use only 1 core, so I increase number of cores working up-to 46 cores.
registerDoMC(cores = number_of_cores - 2)
getDoParWorkers()
## [1] 46
Then, I want plot wave form.
%>%
eegl1 ggplot(aes(x = sec, y = activity, fill = eyeDetection)) +
geom_point(size = 0.1, alpha =0.5) +
geom_line(aes(color = eyeDetection), show.legend = F) +
facet_grid(electrode ~., scales = "free" ) +
theme_classic() +
scale_fill_continuous(name = "Eye status", limits = c(0, 1),
labels = c('close', 'open'), breaks = c(0, 1)) +
xlab('Time series') +
scale_x_continuous(breaks = c(0, 30, 60, 90, 117))