Journalist: Freesia

Editor: Tong

When investing in cryptocurrency, one frequently asked question is: how to predict the rise or fall of the price in one day, since the fluctuation is huge?

Although several papers had tried to answer this question, a new algorithm may answer it better. Recently, a team from the Department of Computer Science, Dartmouth College and VeChain Foundation introduced their research of an algorithm—C2P2, in the paper “A Collective Cryptocurrency Up/Down Price Prediction Engine”. This paper has been accepted by IEEE Blockchain 2019 Conference.

Innovations of C2P2

Bitcoin price predictions are studied most by researchers. In this paper, C2P2 is tested against one of the latest methods of Bitcoin and beat it. C2P2 is short of “Collective Cryptocurrency Price Prediction”. The research team used the collective classification of features combined with similarity metrics and novel iterative procedure in their training model, which would ultimately return the prediction.

Compared with two previous predicting methods developed by S. McNally (2018) and E.Sin (2017), C2P2 won in both experiments by improving the rang of accuracy by 5.1%-83%. The results were tested using AUC metric and were statistically significant.

The better accuracy of C2P2 comes from three important innovations. First, this algorithm comprehensively studies the up/down movements of the 21 most popular cryptocurrencies. So far, no other studies have considered so many different varieties in price prediction.

Second, C2P2 captures pairwise similarities between feature vectors of any two cryptocurrencies and used them for prediction. In the experiments of comparing C2P2 with two other methods, the similarity feature is proved to be of significant impact on the results.

The research team used comments from Reddit as one of the components of a feature vector (fc). They extracted features from 1,624,674 comments from Reddit. Other features in C2P2 that impact the price up/down are global economic metrics and the historical market records of the cryptocurrencies.

Another highlight of C2P2 is the use of novel iterative procedure in the training of the prediction model. Since both internal and external features are changing every day, and that the price of cryptocurrencies moves from time to time, the research team generate the “tentative” prediction and use it for the predictive model to learn in the next round. The model continues learning from the updated features until the result is reached, or until the number of iterations meets its upper limit. 

Procedure of the algorithm

The procedure of C2P2 can be simplified into three steps:

1 Construct time-lagged features

2 Compute pairwise similarity

3 Iteratively update probability features and return predictions

图片.png

Source: “A Collective Cryptocurrency Up/Down Price Prediction Engine”,2019

As mentioned above, the team first captured different features from different sources. Based on time-lag, they create a feature vector lfc,d using historical market data for the 21 cryptocurrencies, 88 features involving global economic metrics every day; as well as the comments from Reddit, which shows public opinions to the price fluctuation. Global economic metrics contains a large variety of economic data from world’s main stock index to the price of bond. Reddit data is collected from 23 relevant discussion communities in half a year (July 1st 2018- Dec 31st 2018)

In order to extract features from texts of comments, the team used several tools including Reddit Statistics, LIWC, and Google Cloud NL to count, analyze and classify different emotions. They finally extract 66 sentiment features for each cryptocurrency on each day.  

Step two is to compare and calculate the similarity of each pair of cryptocurrency from lfc,d of step one. The idea is that the cryptocurrencies with similar feature vectors (S-c) should move up or down in a similar manner over time. By using the S-c, C2P2 shows a better statistical result than the two other methods created by S. McNally and E.Sin. However, if S-c is not included in running the learning model, the advantage of C2P2 decreases. The value of Lift falls from an average of 1.35 to 1.15. The higher the Lift value, the more advantage C2P2 has.

To predict the probability, the team uses feature vectors to train the prediction models iteratively. In this process, the similarities with each other cryptocurrency are used as additional features. By keep updating the training data, the probability predictions have been updated consistently. Finally, the probabilities are returned after the training.

Some researchers incorporated global economic and currency metrics with blockchain data in their model, suggesting a similar thought to features impacting the price of cryptocurrencies. Others use Sentiment Analysis, to understand the influence of public opinions on the price fluctuation. Not many investigations were made to multiple currency predictions. C2P2 might be a good try.