With the surge in cryptocurrencies there has been speculation of market manipulations. I was interested to see if such manipulations (or attempts) could be tracked.

It seems a large population of cryptocurrency traders communicate via Twitter. We can ingest this chatter by looking for any mention of a coins symbol (ie: $BTC) along with the market conditions. Correlating this data we may be able to spot attempted manipulations in the communities perception of the market or coin.

An example tweet that would be correlated using $BTC.

Ingesting data

Twitter

I spun up a virtual machine and installed ELK, Logstash already has an input plugin capable of pulling data from Twitter. I encountered a few issues including having to escape characters ($ = %24), restricted to around 100 keywords and general instability of the plugin. However with the configuration below I managed to capture the required data along with the sentiment for each message:

input {
        twitter {
                consumer_key => "xxxxxx"
                consumer_secret => "xxxxxx"
                oauth_token => "xxxxxx"
                oauth_token_secret => "xxxxxx"
                ignore_retweets => true
                languages => ["en"]
                full_tweet => true
                keywords => ["%24BTC", "%24ETH", "%24XRP", "%24BCH", "%24ADA", "%24LTC", "%24XEM", "%24MIOTA", "%24XLM", "%24DASH", "%24NEO", "%24TRX", "%24EOS", "%24XMR", "%24ICX", "%24BTG", "%24XRB", "%24QTUM"
, "%24ETC", "%24LSK", "%24OMG", "%24XVG", "%24SC", "%24ZEC", "%24BCN", "%24BCC", "%24PPT", "%24KCS", "%24STRAT", "%24VEN", "%24BNB", "%24BTS", "%24USDT", "%24ARDR", "%24DOGE", "%24SNT", "%24STEEM", "%24DCN", "%2
4WAVES", "%24DRGN", "%24REP", "%24ZRX", "%24DGB", "%24VERI", "%24DENT", "%24ARK", "%24WAX", "%24HSR", "%24KMD", "%24GNT", "%24BAT", "%24ETN", "%24PIVX", "%24DCR", "%24SALT", "%24ETHOS", "%24FUN", "%24KNC", "%24K
IN", "%24RDD", "%24QASH", "%24MED", "%24XP", "%24REQ", "%24AION", "%24FCT", "%24ENG", "%24BTM", "%24AE", "%24NXS", "%24SUB", "%24POWR", "%24ELF", "%24ZCL", "%24GAS", "%24NEBL", "%24RHOC", "%24GBYTE", "%24XDN", "
%24MAID", "%24MONA", "%24SYS", "%24BTCD", "%24NXT", "%24COB", "%24SAN", "%24LINK", "%24DGD", "%24ICN", "%24WTC", "%24DBC", "%24GNO", "%24QSP", "%24BNT", "%24UTK", "%24STORM", "%24XZC", "%24GAME", "%24POE", "%24C
VC", "%24VEE", "%24VIBE", "%24RDN", "%24PAY", "%24ACT", "%
24GXS", "%24RPX", "%24SMART", "%24TNB", "%24PAC", "%24SKY", "%24XPA", "%24LEND", "%24STORJ", "%24ENJ", "%24PPP", "%24XBY", "%24VTC", "%24AST", "%24PLR", "%24XCP", "%24OST", "%24EMC", "%24BLOCK", "%24CNX", "%24WABI", "%24SRN", "%24R", "%24BCO", "%24NAV", "%24DTR", "%24MCO", "%24PART", "%24UBQ", "%24RCN", "%24MANA", "%24ANT", "%24CMT", "%24EDG", "%24BAY", "%24CND", "%24CTR", "%24SNM", "%24MOD", "%24UKG", "%24SNGLS", "%24ITC", "%24ATM", "%24FUEL", "%24DATA", "%24NULS", "%24RLC", "%24SPANK", "%24ADX", "%24ZEN", "%24WGR", "%24BRD", "%24PPC", "%24QRL", "%24THC", "%24AMB", "%24MGO", "%24WINGS", "%24DNT", "%24EMC2", "%24SNOV", "%24ETP", "%24XAS", "%24BURST", "%24TRIG", "%241ST", "%24LBC", "%24VIA", "%24MLN", "%24MTH", "%24NLG", "%24LUN", "%24DCT", "%24HST", "%24EDO", "%24RISE", "%24TNT", "%24AGRS", "%24MOON", "%24SHIFT", "%24LRC", "%24TRST", "%24GRID", "%24FLASH", "%24AEON", "%24MTL", "%24PRL", "%24CLOAK", "%24PRE", "%24TKN", "%24GTO", "%24XSH", "%24GVT", "%24INK", "%24PURA", "%24CDT", "%24VOX", "%24LA", "%24BNTY", "%24DLT", "%24DPY", "%24XSPEC", "%24VIB", "%24COSS", "%24SLR", "%24FTC", "%24ADT", "%24GRS", "%24IXT", "%24VOISE", "%24YOYOW", "%24TAAS", "%24IOC", "%24XEL", "%24ONION", "%24JINN", "%24CFI", "%24SLS", "%24GUP", "%24VRC", "%24NMC", "%24PAYX", "%24WRC", "%24DAT", "%24AMP", "%24HVN", "%24ECC", "%24DIME", "%24PASC", "%24POT", "%24EVX", "%24XNN", "%24ZSC", "%24HMQ", "%24BITCNY", "%24MNX", "%24PEPECASH", "%24DRT", "%24NLC2", "%24PKT", "%24CRW", "%24DMD", "%24GRC", "%24AIR", "%24RVT", "%24BLK", "%24SIB", "%24ELIX", "%24MER", "%24BCPT", "%24NET", "%24XMY", "%24DOVU", "%24LOC", "%24POLL", "%24MSP", "%24NSR", "%24PRO", "%24SNC", "%24RADS", "%24DNA", "%24LIFE", "%24TGT", "%24NMR", "%24COLX", "%24BLUE", "%24MDA", "%24NYC", "%24RC", "%24FAIR", "%24NEU", "%24BTX", "%24ART", "%24PPY", "%24ARN", "%24MYST", "%24STX", "%24TIX", "%24ION", "%24PBL"]
        }       
}
filter {
        if [extended_tweet] and [extended_tweet][full_text] and [extended_tweet][full_text] != ""  {
                mutate {
                        add_field => { "message" => "%{[extended_tweet][full_text]}" }
                }
        } else {
                mutate {
                        add_field => { "message" => "%{text}" }
                }
        }
        if [message] {
                sentimentalizer {
                        source => "message"
                }
        }
        if [user] {
                mutate {
                        add_field => {
                                "user_created" => "%{[user][created_at]}"
                                "user_followers_count" => "%{[user][followers_count]}"
                                "user_friends_count" => "%{[user][friends_count]}"
                                "user_name" => "%{[user][name]}"
                                "user_screen_name" => "%{[user][screen_name]}"
                                "user_statuses_count" => "%{[user][statuses_count]}"
                                "user_verified" => "%{[user][verified]}"
                        }
                }
        }
        if [source] {
                mutate {
                        gsub => ["source", "^\<a\s.*?\>|\<\/a\>$", ""]
                }
        }
        prune {
                whitelist_names => [
                        "created_at",
                        "id",
                        "sentiment.*",
                        "source",
                        "message",
                        "^user\_",
                        "timestamp",
                        "coins"
                ]
                blacklist_names => [
                        "in_reply_to_*",
                        "quoted_status_*"
                ]
        }
        mutate {
                convert => {
                        "user_followers_count" => "integer"
                        "user_friends_count" => "integer"
                        "user_statuses_count" => "integer"
                        "user_verified" => "boolean"    
                }
        }
}

output {
        elasticsearch {
                hosts => ["127.0.0.1:9200"]
                index => "twitter-%{+YYYY.MM.dd}"
        }
}

To try and reduce false positives I only tracked coin acronyms with a prefix of a dollar sign (ie: $BTC). Tracking coins like $STORM using a hashtag #STORM can result in ingesting unrelated tweets, such as those referring to weather events. The end result:

Example of tweet ingested into ELK

Cryptos

To ingest current market conditions I wrote a ruby script using the Coin Market Cap API. I pulled this information every 15 minutes and sent it off to Elasticsearch via a systemd-timer. This resulted in a nice dataset:

Coin Market Cap data

Disclaimer

Please do not trade on this information, I am not a finacial or statisical expert. Sentiment analysis isn't perfect, you could argue the message below is positive:

Bad sentiment example

Along with the huge amounts of spam/scams:

Bot Spam

The results may be skewed. If you do plan on using this data you do so at your own risk.

Some statistics

During the period between the 17th of March 2018 till the 17th of June 2018 the system ingested:

  • ~4G of elasticsearch data.
  • 4,286,130 coin price data points.
  • 1,455,454 coin related tweets. Of which:
    • 1,082,358 positive.
    • 279,624 negative.
    • 93,472 netural.

The accounts with the most tweets looked to be bots (no surprise here):

Top tweeting accounts

Examining the data

Reviewing the data was a lot easier with the Timelion functionality inside Kibana. Using the following chart syntax to visualize and overlay tweet sentiment and market conditions:

.es(index=coins-*, timefield=created_at, metric=max:price_aud, q='symbol:BTC').label('Trend AUD').color('#f7eb4a').trend().points(symbol=cross, radius=2), .es(index=coins-*, timefield=created_at, metric=max:price_aud, q='symbol:BTC').fit('carry').label('Price AUD').color('#95259b').yaxis(1, label='AUD').lines(fill=1), .es(index=twitter-*, timefield=@timestamp, metric=count, split=sentiment.polarity.keyword:3, q='$BTC', fit=none).bars(stack=false, width=6).label('Sentiment: $1', '.*keyword:(.*?) .*').yaxis(2, label='Count').color('#259b5e:#9b2525:#aaaaaa')

This produced the following chart. The purple line is the price of the coin (AUD) along with a yellow trend line. The bar chart indicates tweet count by sentiment, green for positive, grey for neutral and red for negative.

BTC

You can view a specific coins graph by selecting a link below:

Investigation #1

Reviewing the $BNB chart we can see an uptick in price along with positive sentiments between 25th Feb - 28th Feb.

BNB

Details

Start End Min $ (AUD) Max $ (AUD) Difference (%)
2018-02-25 10:00 2018-02-28 10:00 11.414 14.106 2.692 (23.59%)

If we take a look at the top Twitter users during this period and break down the sentiment we can see a familiar pattern across the screen names:

investigation-1-top-50

Taking the top 5 Twitter handles excluding DogeClub:

  • Ирина Романова
  • Екатерина Волошина
  • Юля Морозова
  • Елена Ефремова
  • Анастасия Козлова

We find that all 5 Twitter users have seemingly random screen names, barely any followers, all use Android and as of writing all accounts are suspended.

investigation-1-similar-values

An example of some of the tweets:

investigation-1-tweets

They all are advertising a Telegram group named "Symetracryptosignal". Interestingly DodgeClub was also sending tweets referring to a separate Telegram group.

The spike in positive sentiment and the account suspensions indicate automated bots. It's hard to determine if these types of tweets had any affect on $BNB price fluctuation but I find it unlikely. A well timed coincidence?

Investigation #2

Again the $DENT chart shows a significant increase in positive tweets following which a 24% jump in price over the next 9 hours.

DENT

Details

Start End Min $ (AUD) Max $ (AUD) Difference (%)
2018-03-03 16:00 2018-03-05 23:59 0.025 0.031 0.006 (24%)

The difference here is the positive sentiment tweets stop 4 hours before the price increases. The number of tweets per hour isn't drastic but manipulating a smaller cryptocurrencies price would be easier than say Bitcoin or Ethereum.

Taking a look at the top accounts:

investigation-2-top-50

Again the top 5 accounts account for a large portion of the positive tweets:

  • Marie Gustman
  • Viktor
  • Leah Nash
  • Cheryl Eddington
  • иван

investigation-2-tweets

It became clear very quickly that this is a similar campaign to the first investigation. One look at the tweets shows suspended accounts tweeting about Telegram groups.

Investigation #3

We've seen the last two investigations have led to bots posting positive sentiment. If we focus on negative sentiment we see very few charts exhibiting extreme fluctuations. Where negative sentiment does spike it usually only lasts for one bar (~ 1 hour). Some coin charts that exhibit this include $CVC, $ETN, $PPT, $RHOC, $SNT and $ZCL.

We can see from the screenshot below that the negative sentiment incorrectly tags a lot of tweets. I would hazard a guess that due to Twitters small character count the sentiment analysis has a hard time profiling tweets.

investigation-3-tweets

Conclusions

It's become clear that sentiment analysis on less than 50 words is unreliable. Coupled with the habit of including many coin symbols in one tweet can skew results. In hindsight I should have parsed every coin symbol from each message and only used tweets with less than 4 symbols.

This experiment has shown that to track market manipulation, you need more than social media data. If it were possible to access trading records with links to Twitter users you may be able to visualize manipulations.

A pity there was no solid evidence I could find given the recent media reports. If you'd like the data to analyse yourself or have suggestions feel free to drop me an email.

Sidenote

I thought a good example of sentiment analysis not working was graphing any tweet containing the word 'scam'. I would have expected a greater percentage of negative tweets (5,219 in total).

sentiment