# Time Series and stationarity¶

A time series is a series of data points captured in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. This post is the first in a series of blogs on time series methods and forecasting.

In this blog, we will discuss about stationarity, random walk, deterministic drift and other vocabulary which form as foundation to time series:

## Stochastic processes¶

A random or stochastic process is a collection of random variables ordered in time. It is denoted as $Y_t$. For example, in-time of an employee is a stochastic process. How is in-time a stochastic process? Consider the in-time on a particular day is 9:00 AM. In theory, the in-time could be any particular value which depends on many factors like traffic, work load, weather etc. The figure 9:00 AM is a particular realization of many such possibilities. Therefore we can say that in-time is a stochastic process where as the actual values observed are a particular realization (sample) of the process.

## Stationary Processes¶

A stochastic process is said to be stationary if the following conditions are met:
1. Mean is constant over time
2. Variance is constant over time
3. Value of the co-variance between two time periods depends only on the distance or gap or lag between the two time periods and not the actual time at which the co variance is computed

This type of process is also called weakly stationary, or co variance stationary, or second-order stationary or wide sense stationary process.

Written mathematically, the conditions are: $$Mean: E(Y_t) = \mu$$ $$Variance: var(Y_t) = E(Y_t-\mu)^2 = \sigma^2$$ $$Covariance: \gamma_k = E[(Y_y - \mu)(Y_{t+k} - \mu)]$$

### Purely random or white noise process¶

A stochastic process is purely random if it has zero mean, constant variance, and is serially uncorrelated. An example of white noise is the error term in a linear regression which has zero mean, constant standard deviation and no auto-correlation.

### Simulation¶

For simulating a stationary process, I am creating 100 realizations(samples) and comparing their mean, variance and co-variance. The data for 6 days and 5 realizations is shown:

Samples of Stationary process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2021-08-22 0.8567523 0.0640669 0.7249642 0.3505523 0.8072590
2 2021-08-23 0.1086189 0.3813137 0.5232923 0.4626156 0.7855022
3 2021-08-24 0.4652674 0.3546999 0.2091995 0.2395056 0.9567884
10 2021-08-31 0.8191081 0.1502863 0.1491222 0.6095235 0.3435151
15 2021-09-05 0.9950261 0.1406165 0.1177429 0.9329218 0.4191168
30 2021-09-20 0.8800055 0.9952208 0.7189119 0.7209880 0.6886932

The mean, variance and co-variance between the samples (realizations) across are as follows: For a stationary process, the mean, variance and co variance are constant.

## Non-stationary Processes¶

If a time series is not stationary, it is called a non-stationary time series. In other words, a non-stationary time series will have a time-varying mean or a time-varying variance or both. Random walk, random walk with drift etc are examples of non-stationary processes.

### Random walk¶

Suppose $\epsilon_t$ is a white noise error term with mean 0 and variance $σ_2$. Then the series $Y_t$ is said to be a random walk if $$Y_t = Y_{t−1} + \epsilon_t$$ In the random walk model, the value of Y at time t is equal to its value at time (t − 1) plus a random shock.
For a random walk, $$Y_1 = Y_0 + \epsilon_1$$ $$Y_2 = Y_1 + \epsilon_2 = Y_0 + \epsilon_1 + \epsilon_2$$ $$Y_3 = Y_2 + \epsilon_3 = Y_0 + \epsilon_1 + \epsilon_2 + \epsilon_3$$ and so on.. In general we could write
$$Y_t = Y_0 + \sum \epsilon_t$$ As $$E(Y_t) = E(Y_0 + \sum \epsilon_t) = Y_0$$ $$var(Y_t) = t\times \sigma^2$$
Although the mean is constant with time, the variance is proportional to time.

For simulating a random walk process, I am creating 100 realizations(samples) and comparing their mean, variance and co-variance. The data for 6 days of 5 realizations (samples) is shown:

Samples of Random walk process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2021-08-22 4.0000000 4.000000 4.000000 4.000000 4.000000
2 2021-08-23 2.8539208 3.176672 5.446530 5.983017 4.092452
3 2021-08-24 2.9712968 2.009320 5.349939 5.785329 3.451442
10 2021-08-31 -0.7251274 2.289063 2.809076 7.623148 3.587220
15 2021-09-05 -0.5766986 2.559916 5.796322 11.124585 3.992667
30 2021-09-20 0.8613258 6.340583 7.554369 12.667196 9.039007

The mean, variance and covariances between the samples (realizations) across time would look like follows: From the above plot, the mean of Y is equal to its initial, or starting value, which is constant, but as t increases, its variance increases indefinitely, thus violating a condition of stationarity.

A random walk process is also called as a unit root process.

### Random walk with drift¶

If the random walk model predicts that the value at time t will equal the last period's value plus a constant, or drift ($\delta$), and a white noise term ($ε_t$), then the process is random walk with a drift.
$$Y_t = \delta + Y_{t−1} + \epsilon_t$$ The mean $$E(Y_t) = E(Y_0 + \sum \epsilon_t + \delta) = Y_0 + t\times\delta$$ so mean is dependent on time
and the variance $$var(Y_t) = t\times \sigma^2$$ is also dependent on time. As random walk with drift violates the conditions of stationary process, it is a non-stationary process.

Samples of Random walk with drift process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2021-08-22 4.000000 4.000000 4.000000 4.000000 4.000000
2 2021-08-23 5.801028 5.137174 5.133682 4.186611 4.148583
3 2021-08-24 5.668875 5.741908 4.304969 1.706955 4.467738
10 2021-08-31 11.843748 12.207555 6.289473 5.411678 6.595657
15 2021-09-05 15.982235 15.516312 10.664175 5.937640 11.215054
30 2021-09-20 25.507838 22.875968 20.498484 11.750327 16.976809 The mean, variance and the co-variance are all dependent on time.

### Unit root stochastic process¶

Unit root stochastic process is another name for Random walk process. A random walk process can be written as $$Y_t = \rho \times Y_{t−1} + \epsilon_t$$ Where $\rho = 1$. If $|\rho| < 1$ then the process represents Markov first order auto regressive model which is stationary. Only for $\rho = 1$ we get non-stationary. The distribution of mean, variance and co-variance for $\rho =0.5$ is ### Deterministic trend process¶

In the above random walk and random walk with drift, the trend component is stochastic in nature. If instead the trend is deterministic in nature, it will follow a deterministic trend process. $$Y_t = β_1 + β_2\times t + \epsilon_t$$ In a deterministic trend process, the mean is $β_1 + β_2\times t$ which is proportional with time but the variance is constant. This type of process is also called as trend seasonality as subtracting mean of $Y_t$ from $Y_t$ will give us a stationary process. This procedure is called de-trending.

Samples of Deterministic trend process
date realization_1 realization_2 realization_25 realization_50 realization_100
1 2021-08-22 0.6408252 0.4380591 3.145050 0.5609335 1.175747
2 2021-08-23 0.7580409 1.5146209 1.939386 2.3880532 2.137324
3 2021-08-24 1.7964618 4.6772129 5.069398 4.7261802 2.939012
10 2021-08-31 7.7861035 9.3627214 8.584801 10.7855074 9.830755
15 2021-09-05 15.8649412 14.2029992 15.279319 13.5478111 14.647241
30 2021-09-20 30.9469391 30.6891297 29.446486 28.9586901 31.170632 A combination of deterministic and stochastic trend could also exist in a process.

## Comparison.¶

A comparison of all the processes is shown below: 1. Basic Econometrics - Damodar N Gujarati (textbook for reference)
2. Business Analytics: The Science of Data-Driven Decision Making - Dinesh Kumar (textbook for reference)
3. Customer Analytics at Flipkart.Com - Naveen Bhansali (case study in Harvard business review)