Hello guys,
I hope you are
doing well.
In this new post
I would like to explain the aim of my research by using a real life example.
Whenever my friends or family (who don't know much about machine learning,
statistics, or data analysis) ask me to explain what I am doing with
mathematical models and methods in my PhD, I often reply by saying
"I am a kind of medical doctor.... I
analyse batches of data, which represent the behavior of a system, which can be
a bridge, a tunnel, a nuclear power plant component, a steam generator, a
train, a car, a sensor, a financial product, or whatever you like (also a
human-being if you can provide me with some data), and based on the
characteristic of these batches, I assess the health condition of the
system, and consequently infer if the system is healthy or needs to take
some drugs".
Although
the explanation is a bit vague, it does the job and people sort of get an idea
about my research. However, as you know, people are skeptical usually and
do not believe that similar mathematical approaches (such as clustering
techniques, machine learning methods, etc.) can be used in such wide areas...
from bridges, to nuclear power plant components, finance, etc.
Therefore,
in what follows I am going to perform a data analysis of the price of the
Bitcoin, by using the Empirical Mode Decomposition (EMD) method,
which is used to analyse the behavior of a system. This way, I am going to try
to predict the price of the Bitcoin by relying on its historical price and
trend.
Bitcoin is
the most famous cryptocurrency, and it can be the currency of the future.
Nowadays, a single Bitcoin worth more than 4000$!!
Do not
worry, I am not going to explain the mathematical background of the method, and
I promise that you won't see any formulas!!
- EMD description
The idea of the EMD method, which was
initially developed and proposed by Norden Huang et al., is to decompose a signal into a set of
basic modes, which are different frequency components of the signal.
Essentially breaking down one signal to a sum of different "simpler"
signals with different frequencies. Each non-stationary and nonlinear time
varying signal can be represented by a series of signals, where each individual
signal has a different frequency. Each frequency component is called Intrinsic
Mode Function (IMF). The EMD also provides a residual function, which is
monotonic or constant and represents the overall behavior of the signal during
the time of the analysis. You can find a good discussion here.
- Analysis of the Bitcoin price
Disclaimer: this post has been drafted on Tuesday the 3rd of
October, and therefore the analysis of the Bitcoin price is based on the data
available at that time!!
Let's consider
the closure price of the Bitcoin in the 2107. We are going to
firstly analyse the period from January 1st, 2017 to August 28th, 2017.
This way, we hope to draw some conclusions regarding the future price of the
Bitcoin, and then we can verify our conclusion with the Bitcoin price
from August the 29th to today (3rd of October).
The price
of the Bitcoin is available at coinmarketcap.com
It is
worth noting that during this step of the analysis, the price of Bitcoin from
the January the 1st, 2017 to August the 28th, 2017 is used as
the ONLY input to the EMD method!! This means that during the first step of the analysis, we DO
NOT KNOW the Bitcoin price after the 28th of August.
In Figure
1, you can see the price of a Bitcoin in the interval between January the 1st,
2017 (Day 1) and August
the 28th, 2017 (Day 243). Figure 1 shows that the price of the Bitcoin has
increased a lot in the last year, as you all may know.
Figure 1. Bitcoin price from January to August |
If the Bitcoin
price of Figure 1 is used as input to the EMD method, we obtain Figure 2, which
shows all the frequency components of the Bitcoin price. Particularly, the
first IMF (IMF 1), which shows the highest frequency, depicts a variation of
the Bitcoin price that is very similar to the actual variation of the price of
the Bitcoin (as shown in Figure 1). Indeed, the first IMF (IMF 1) represents
the fastest variations of the Bitcoin price, and consequently it represents the
daily variation of the Bitcoin price. The "slower" IMFs, such as IMF
2 and IMF 3, show a trend analogous to IMF 1, but slower in frequency. The
analysis of the slowest frequency component of the Bitcoin price, which is IMF
4, is very interesting. In fact, Figure 2 shows that IMF 4 (fourth plot from
the left end of Figure 2) has 6 extreme values (3 peaks, and 3 valleys), which
are almost equally spaced on time, and therefore a more detail analysis may
lead to some interesting conclusion. Finally, the analysis of the residuals
shows that the price of the Bitcoin has definitely increased during the first 8
months of the year.
Figure 2. EMD analysis of the Bitcoin price in the period January to August |
A deeper analysis of the IMF 4, which
shows the slowest variation of the Bitcoin price, is shown in Figure 3. Figure
3 shows the price of the Bitcoin on the top plot, whilst the IMF 4 is depicted
on the bottom plot. The red dotted vertical lines are depicted at the time of
the extreme values of the IMF 4. It is straightforward to note that the
peak values of IMF 4 are usually earlier in time than the corresponding peak
values of the Bitcoin price. For example, the IMF 4 shows a peak at day 159,
which corresponds to the peak of the Bitcoin price at day 162, and consequently
by analyzing the behavior of IMF 4 we might predict a peak value of
the Bitcoin price!! A similar behavior is observed also for the peak at day 54,
which is not observed due to the large time scale of Figure 3. By following
this conclusion, we can claim that the final peak at day 238 (August the 26th)
is anticipating a peak value of the Bitcoin price, which will then decrease in
the near future.
It should be noted that when IMF 4 decreases towards a valley, the Bitcoin price is lower than the previous peak value. This means that the Bitcoin price usually reaches a peak value, and then starts to decrease. Finally, it is worth noting that the valleys of the IMF 4, which are the minimum values of the IMF 4, are usually reached a couple of days later than the actual local minimum value of the Bitcoin price, as shown in Figure 3 around day 200.
It should be noted that when IMF 4 decreases towards a valley, the Bitcoin price is lower than the previous peak value. This means that the Bitcoin price usually reaches a peak value, and then starts to decrease. Finally, it is worth noting that the valleys of the IMF 4, which are the minimum values of the IMF 4, are usually reached a couple of days later than the actual local minimum value of the Bitcoin price, as shown in Figure 3 around day 200.
Figure 3. Bitcoin price vs IMF4 |
Before moving to the analysis of the
Bitcoin price from August the 29th to October the 3rd, which has NOT been used
so far, due to the fact the EMD analysis has been carried out by using only the
Bitcoin price from January to August, we can summarize the results of the EMD
analysis as follows:
1. The EMD analysis has shown that the
Bitcoin price can be decomposed into several frequency components
2. The fastest frequency components follow the daily/weekly
variability of the Bitcoin price
3. The slowest frequency component (IMF 4) seems able to predict a
peak in the Bitcoin price, which would then decreases for a period of time that
on average lasts for 35/40 days from the peak value of IMF 4.
4. Given these results, we can expect that the Bitcoin price will
have a peak during the first days of September, which will be then followed by
a period of time where the Bitcoin price decreases, i.e. the Bitcoin price will
be lower than the peak value reached at the beginning of September. After the period where the Bitcoin price will decrease, we can expect a new increase of the Bitcoin price after the local minimum of IMF 4, that should be reached around 35/40 days after the peak at day 238.
At this point,
we can verify our results by analyzing the Bitcoin price from August
the 29th. Figure 4 shows the Bitcoin price for
January the 1st to October the 3rd, where the red line depicts the Bitcoin
price from August the 29th to
October the 3rd. Again, it should be noted that the price of the Bitcoin during
this period, red line in Figure 4, is not used in the previous EMD analysis,
and consequently the results and conclusion of the EMD analysis have been
achieved WITHOUT knowing the Bitcoin price during September.
Figure 4 shows
that the EMD analysis results are correct: a peak value of the Bitcoin price,
which is followed by a decreasing trend, is verified. Indeed, the red line,
which shows the unknown Bitcoin price, shows that the Bitcoin price
reaches a peak value at day 244 (September the 1st), and then decreases.
Particularly, it is worth noting that the peak value is not reached anymore during
September.
Figure 4. Price of Bitcoin from January to October. Red line is the price of the Bitcoin that is not used during the EMD analysis |
Finally, we can use the bitcoin price
during the whole period, from January to October, as input to the EMD method
with the aim of analyzing the Bitcoin price during the year so far, and trying
to predict the future Bitcoin price.
Figure 5
shows the EMD results of the EMD analysis of the price of the Bitcoin during
the whole year. We can see that again the Bitcoin price can be decomposed into
4 frequency components: the fastest components (IMF 1, 2 and 3) follow the
variability of the Bitcoin price, whilst the slowest component (IMF 4) shows
the variation of the Bitcoin price over a long period of time. IMF 4 shows 7
extreme values: a) the first 5 extreme values are those of
Figure 2 and 3, which represent the maximum and minimum values of the Bitcoin
price during the period of time that has been analysed previously; b) the
6th extreme value, which is the peak at day 238, represent the maximum value of
the Bitcoin price that has been predicted by the EMD analysis; c) the
7th extreme value, which is the minimum value at day 268, represent the drop of
the Bitcoin price at day 257 (September the 14th). It should be noted that this
last minimum values occurs 30 days after the peak value at day 238, and
therefore a bit earlier than the expected time which has been obtained by the
EMD during the first step of the analysis (35/40 days of average decreasing
time)
Figure 5. EMD analysis of the Bitcoin price in the period January to October |
The brief analysis of the
Bitcoin price, which has been presented, is a good example of data analysis of
a system. Indeed, consider a system where we have multiple evidences of the
system behavior continuously, for example, measurements of the acceleration of
a bridge, the temperature of a water supply system, the current of an
electrical component, etc. we can analyse those data by computing their
frequency components, and consequently if the frequency components of the
system change suddenly, we can spot a unexpected behavior of the system!!
However, a more detail and complex data analysis is usually requested in order
to adequately assess the health state of a system.
Finally,
I would like to mention that the analysis of the Bitcoin price is only a brief
and superficial analysis, carried out for a bit of fun and as an interesting
example of how my work can be used in many different aspects of real life, and
consequently it is not an advice to invest some money on the Bitcoin. In fact,
I have only considered the closure price of the Bitcoin, without considering
other variables (such as the market cap, the daily news from newspapers or
governments, events, holidays, market decisions of companies, daily trade volume, etc.) that can
influence the Bitcoin price. A more detailed analysis of the Bitcoin price
could be done by considering those variables that can influence the price of
the Bitcoin, and at the same time, a more reliable prediction of the price of
the Bitcoin could potentially be achieved by using machine learning methods,
such as ANN, or forecasting algorithms.
I hope
you enjoyed this post!!
Stay tuned,
Matteo