Advanced Chinese Rap Lyric Generation with Integrated Markov Chain and LSTM Models

Songwei Li

2024

Abstract

This paper aims to innovatively generate Chinese rap lyrics using advanced machine learning technologies, specifically Markov Chains and Long Short-Term Memory (LSTM) models. The project begins with the comprehensive collection and cleaning of Chinese rap lyrics data, covering key steps in data preprocessing, including word segmentation and tagging using Jieba. In the development phase of the two models, I first constructed a Markov Chain model based on enhanced tag analysis for basic lyric generation. Subsequently, I built an LSTM model that predicts the next word in a sequence by learning from sequences of lyrics. For this, I prepared the data by converting lyrics into sequences of tokens and creating corresponding labels for LSTM training. The architecture of the LSTM model was carefully designed to suit the needs of text generation, including embedding and LSTM layers. Additionally, I trained this model, adjusting hyperparameters to achieve optimal performance. In the testing and evaluation phase, I assessed the uniqueness and coherence of the Markov Chain model. For the LSTM model, I used quantitative metrics such as Perplexity or BLEU scores to evaluate the linguistic quality of the generated lyrics, assessing the creativity, thematic consistency, and overall appeal of the LSTM generated lyrics.

Download


Paper Citation


in Harvard Style

Li S. (2024). Advanced Chinese Rap Lyric Generation with Integrated Markov Chain and LSTM Models. In Proceedings of the 1st International Conference on Data Science and Engineering - Volume 1: ICDSE; ISBN 978-989-758-690-3, SciTePress, pages 371-376. DOI: 10.5220/0012842800004547


in Bibtex Style

@conference{icdse24,
author={Songwei Li},
title={Advanced Chinese Rap Lyric Generation with Integrated Markov Chain and LSTM Models},
booktitle={Proceedings of the 1st International Conference on Data Science and Engineering - Volume 1: ICDSE},
year={2024},
pages={371-376},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012842800004547},
isbn={978-989-758-690-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Data Science and Engineering - Volume 1: ICDSE
TI - Advanced Chinese Rap Lyric Generation with Integrated Markov Chain and LSTM Models
SN - 978-989-758-690-3
AU - Li S.
PY - 2024
SP - 371
EP - 376
DO - 10.5220/0012842800004547
PB - SciTePress