The logic here is simple. Apply Markov Property to generate Donald’s Trump’s speech by considering each word used in the speech and for each word, create a dictionary of words that are used next.
I am not just giving you the code for your project, I think you should understand the concept and I am going to try my best for that.
1. Start with importing the required libraries, with the following command
import numpy as np
2. Read the datasets
trump = open('C://Users//NeelTemp//Desktop//demos//speeches.txt', encoding='utf8').read()
#display the data
print(trump)
3. Split the datasets into individual words
corpus = trump.split()
#Display the corpus
print(corpus)
4. Next, create a function that generates the different pairs of words in the speeches. To save up space, we’ll use a generator object.
def make_pairs(corpus):
for i in range(len(corpus) - 1):
yield (corpus[i], corpus[i + 1])
pairs = make_pairs(corpus)
5. Next, let’s initialize an empty dictionary to store the pairs of words.
word_dict = {}
for word_1, word_2 in pairs:
if word_1 in word_dict.keys():
word_dict[word_1].append(word_2)
else:
word_dict[word_1] = [word_2]
6. Build the model. We'll randomly start picking up words from the corpus and start forming the chain.
#randomly pick the first word
first_word = np.random.choice(corpus)
#Pick the first word as a capitalized word so that the picked word is not taken from in between a sentence
while first_word.islower():
#Start the chain from the picked word
chain = [first_word]
#Initialize the number of stimulated words
n_words = 20
7. Finally, let's display the stimulated text
#Join returns the chain as a string
print(' '.join(chain))
And you are done! Congratulations. Have a look at this blog for a better understanding of this concept.