中國有嘻哈——押韻機器人

weixin_34037977發表於2017-09-15

[本文出自天外歸雲的部落格園]

押韻機器人簡介

近來群裡看到有人談起押韻機器,突然想起好多年前的回憶。

心血來潮寫了一個押韻機器人。可以識別韻腳、比較韻腳、詞彙列表按韻腳分類。

 

經測試,目前對多音字支援不好:比如嘮嗑,嘮叨。這種就識別會出錯。歡迎大家繼續測試,有問題反饋給我。

 

拼音識別基於pypinyin庫實現,具體用法詳見其github

押韻機器人程式碼

押韻機器人程式碼檔案命名叫“punchliner.py”,程式碼如下:

from pypinyin import pinyin, lazy_pinyin, Style

words = ["今天","太躁","艾福傑尼","著迷","太繞","心間","","盛宴","榴蓮","虧欠","二百五","腐乳","火鍋底料","MC大笑","別跟我嘮","我感冒","好不好","太早","住","兄弟","胸臆","太辣","太大","太炸","我手抖"]

def is_alphabet(uchar):
    rule1 = (uchar >= u'\u0041' and uchar<=u'\u005a')
    rule2 = (uchar >= u'\u0061' and uchar<=u'\u007a')
    if rule1 or rule2:
        return True
    else:
        return False

def get_punchline(word):
    last_character = word[len(word)-1]
    last_character_pinyin = pinyin(last_character)[0][0]
    punchline = []
    for the_char in last_character_pinyin:
        if not is_alphabet(the_char):
            punchline.append(last_character_pinyin.split(the_char)[0])
            punchline.append(the_char)
            punchline.append(last_character_pinyin.split(the_char)[1])
    return punchline

def compare_punchline(word1,word2):
    punchline1 = get_punchline(word1)
    punchline2 = get_punchline(word2)
    prefix1 = punchline1[0]
    prefix2 = punchline2[0]
    #字首尾字母設定不為空
    prefix1_last_char = 'x'
    prefix2_last_char = 'x'
    if prefix1 != '':
        prefix1_last_char = prefix1[len(prefix1)-1]
    if prefix2 != '':
        prefix2_last_char = prefix2[len(prefix2)-1]
    #字首先決條件,都是i或都不是i才算押韻
    pre_rule1 = (prefix1_last_char == 'i')
    pre_rule2 = (prefix2_last_char == 'i')
    all_i = (pre_rule1 and pre_rule2)
    all_not_i = 'i' not in [prefix1_last_char,prefix2_last_char]
    if all_i or all_not_i:
        rule1 = punchline1[1] == punchline2[1]
        rule2 = punchline1[2] == punchline2[2]
        if rule1 and rule2:
            return True
        else:
            return False
    else:
        return False

def classify_punchline(words_list):
    target = words_list[0]
    yayun_words = filter(lambda word:compare_punchline(target,word)==True,words)
    yayun_words_list = list(set(yayun_words))
    left_words_list = list(set(words_list)-set(yayun_words_list))
    print(yayun_words_list)
    rule1 = left_words_list != words_list
    rule2 = len(left_words_list) > 0
    if rule1 and rule2:
        classify_punchline(left_words_list)
    
if __name__ == '__main__':
    #print(get_punchline("變"))
    #print(get_punchline("案"))
    #print(get_punchline("繞"))
    #print(compare_punchline("安","翻"))
    #print(compare_punchline("變","案"))
    #print(compare_punchline("房","狼"))
    #print(get_punchline("嘮"))
    classify_punchline(words)

其中:

1. 函式fuck_yayun可以對詞藻列表中的詞彙進行判斷,把押韻的詞彙進行自動歸類;

2. 函式get_punchline可以獲取詞彙韻腳;

3. 函式compare_punchline可以比較韻腳。

希望有朝一日可以像發明AlphaGo一樣發明AlphaRapper,讓他去參加中國有嘻哈。

執行結果:

 

相關文章