ation, and maximum percentage change, respec-
tively.
On the social media side, it has been proven that ac-
tual, organized price rigging occurs there (Kamps and
Kleinberg, 2018). The least active groups only carry
out one P&D operation per week, whilst the most ac-
tive groups carry out roughly one operation every day.
Generally speaking, the following steps are taken dur-
ing the procedure (Martineau, 2018):
• A few days or hours prior to the operation, the ad-
ministrators make public the fact that P&D will
occur, the exchange to be used, the precise time
the operation will begin, and whether the opera-
tion will be Free for All, in which case everyone
receives the message simultaneously, or Ranked,
in which case VIPs and members at higher levels
in the hierarchy receive the initial message before
other members;
• The notice is made more frequently as the opera-
tion’s execution time draws near;
• The organizers give these straightforward advice
just before the event begins: as you wait for an
outside investor, check your Internet connection,
buy low and sell high, and hold currency as much
as you can;
• The free chat rooms are currently closed to pre-
vent ”Fear, Uncertainty, and Doubt” (FUD) as a
result of deception efforts put out by those look-
ing to disrupt the operation and cause panic within
the group;
• Depending on where you are in the group hierar-
chy, you will know exactly when the designated
time comes for the targeted cryptocurrency to be
exposed. The name of the cryptocurrency is typ-
ically written in a fuzzy image that can only be
read accurately by humans. The purpose of the
obfuscation is to hinder bots’ ability to analyze
the message using OCR methods and begin the
process more quickly than humans;
• Admins release a news item shortly after the op-
eration begins and ask everyone in the group to
spread the message that the price of the cryptocur-
rency is increasing. This is done on Twitter, in
forums, and in special chats. With the use of a
special investment opportunity, this activity hopes
to draw in outside investors;
• Ultimately, the admins reopen the free chat rooms
after the operation and give the users some P&D
statistics.
Studying this process and starting from the times-
tamps of the manually labelled pumps and the tele-
gram group link from which the information was re-
trieved by the authors, we joined into these groups
and through the Telegram APIs we downloaded all
the messages exchanged in the chats from two days
before and two days after the pump.
However, only some of these telegram groups al-
lowed access to the message history, so we went from
a data set of 104 pumps to one of 89 pumps.
Using Natural Language Processing techniques, a
pre-processing of these messages was carried out,
consisting of the following steps:
• Removal of Emoji, Images, Stop-Words and
External Links: process of reducing non-
informative text;
• Lemmatization: process of grouping together the
inflected forms of a word so they can be analysed
as a single item;
• Pos Tagging: process of marking up a word in a
text as corresponding to a particular part of speech
based on both its definition and its context by ex-
tracting only words referring to nouns (NOUN)
and proper nouns (PROPN).
After gathering the messages from the relevant chat
for each pump index, we proceeded to the stage of
extracting the features.
In particular, for each of the three data sets the
authors (La Morgia, 2020) created, for each of the fi-
nancial transactions and consequently for each pump
index defined within them, the new feature takes on a
value of 1 if the currency symbol occurs in the mes-
sages exchanged 5 minutes before the transaction was
made, otherwise 0. We thus produce the final data set,
which consists of 89 pumps, by only adding a categor-
ical feature to the financial ones.
3.2 Model
The authors’ adopted classifier, (La Morgia, 2020),
was employed in order to assess the scores and sig-
nificance of utilizing a new Telegram feature in the
most effective way. We are discussing the Random
Forest classifier, which is a collection of decision tree
classifiers that depend on the values of a random vec-
tor sampled independently, each of which casts a vote,
with the prediction being the class with the most votes
overall (Breiman, 2001). Since our data set consisted
of 89 pump and dumps, we did not divide it into nor-
mal train and test sets. Instead, we used a 5 fold cross-
validation to provide a more accurate assessment of
the performance. For the Random Forest classifier we
use a forest of 200 trees, each leaf node must have at
least 6 samples, and a maximum depth of 4 for each
tree.
Pump and Dump Cryptocurrency Detection Using Social Media
237