Text-Generating AI in Python
Recurrent neural networks are very powerful when it comes to processing sequential data like text. They are a special type of network in which data is not only fed forward but also backward. Some variations like LSTMs (Long-Short-Term Memory) have mechanisms inside of them, which allow the network to focus on the most essential information.
In this tutorial, we are going to build an LSTM model and teach it how to write poetic texts like Shakespeare.
Yes you read that right! We are going to train a neural network to write texts similar to those of the famous poet. At least kind of. What the model is actually going to do is predicting the next letter based on the previous ones. So our AI will not really understand the meaning of the text but just guess which next letter will make the most sense. Nonetheless, we will still end up with quite impressive results.
To make such a model effective, we need to train it on a substantial amount of data. For this project we will use Google’s Tensorflow dataset with Shakespeare’s texts. Keep in mind however that you can use any dataset you want. If you want to imitate Goethe instead of Shakespeare, find a Goethe dataset and use this one.
We will start by loading the Shakespeare texts directly into our script using Keras. For this we need to import Tensorflow. Later on we will also need NumPy and the random module.
Notice that you will have to install NumPy and Tensorflow manually, since they are not part of the core Python stack. You can use pip to do this.
Preparing The Data
Now we have all the necessary textual data, but we are not able to work with it yet. For us humans text may be understandable, but we cannot just tell a neural network to read the text and learn from it. We need to find a reasonable way to convert the poetry into numerical data.
For the start we are going slightly change the last line we wrote, to convert the whole text into lower case. This will make the processing easier and the results better. After that, we are going to create two dictionaries that are going to serve as our translators. We will use them to convert characters into numbers and vice versa.
Notice that we are creating a sorted set of characters before we create the dictionaries. This has two effects. First of all, all the characters are sorted in ascending order. Secondly, by using the set function, we are getting a collection where each character appears only once. In other words, we end up with a sorted list of all characters that appear in the text.
Next, we need to think about what we actually want to do with our model. Our plan is to start with an initial text as the input and to have the model generate a continuation of the text, letter by letter. Based on this plan we are going to structure the training data. For this we need to specify how long the input text and how big the step size between the individual sentences shall be.
In the code above we define a sequence length of 40 and a step size of 3. This means that our model will look at the last forty characters to determine the next one. After that, it will skip three letters and create the next training sample.
Here we are working with labeled data, which means that we are engaging in supervised learning. Thus, for the training data, we always provide forty initial characters and then we add the actual next character to the second list.
Once we have executed this part of the code, we have a decent amount of training data but it is not yet converted into a numerical format. In order to change this, we will use the dictionaries we defined above.
We first create two NumPy arrays full of zeros that have the right dimensions. You might be a bit confused as to why the length of the character set is important here. This is because we are not just passing the sentences as numerical values directly, but we are flagging the individual positions.
In other words: For each position in a sentence, we have a flag for every possible character that could be at that position. One of those flags will be set to one and all the others are going to be set to zero. This means that our training data is full of binary flags that indicate what the individual sentences look like. A structure like this makes it easier for a neural network to process the information.
Building The Model
Now that our training data is prepared, we can start building the model. For this we will have to add some imports from Keras.
In this tutorial we are going to keep the model quite simple and basic. We will have a single LSTM layer with 128 neurons and one dense layer with as many neurons as there are different characters. The activation function used here will be softmax.
For the sake of simplicity we are not going to talk about all the mathematical details of the optimizers, loss function and activation function used here. However, it is important to know that the softmax function normalizes the outputs, which means that for each possible character we will get the probability of it being the “right” next choice.
Finally we train the model using our prepared data. In this example we use four epochs and a batch size of 256. This means that the model is going to see the same data four times and it is going to see 256 samples at once. You can tweak these values to increase the performance of the model.
Generating Texts
For the final part, we now need to do all the previous steps backwards. We need to convert the binary outputs into characters and we need to combine those individual characters into longer sequences of text.
To do this, we will write two functions. The first one will be a sample function, which will (kind of) randomly choose a reasonable character from the prediction output. The second one will then use this function to generate a full text of a given length.
As you can see, this function has a parameter called temperature. It determines how conservative or experimental the choice of the next letter shall be. The sample function chooses one of the more likely softmax outputs. However, the temperature decides how much we lower our standards.
The lower the temperature, the more conservative our samples. The higher the temperature, the more experimental they are. We should strive for a balance here since a too conservative choice will result in boring and repetitive texts, whereas a too experimental choice will result in unreadable texts with randomly made up words.
To now finally generate a text, we need to specify the desired length and the temperature. Our text generation function will then pick a random starting point in the text, to find the first forty characters for the input. We take this initial text and convert it into the format that we used for the training data as well. This is necessary because our model cannot start generating text from scratch.
After that, we feed this data into our model and pick one of the more likely samples. We convert this result to an actual character and append it to the input text. After forty such iterations the model is no longer generating based on Shakespeare’s actual texts but on its own generated characters. This can go on for as long as we want it to.
Results
Last but not least, let us get to the fun part. We will now use our script to actually generate some poetic texts with a neural network. For this we will play around with the temperature a bit.
Settings: Length = 300, Temperature = 0.4 (More Conservative)
ay, marry, thou dost the more thou dost the mornish, and if the heart of she gentleman, or will, the heart with the serving a to stay thee, i will be my seek of the sould stay stay the fair thou meanter of the crown of manguar; the send their souls to the will to the breath: the wry the sencing with the sen
Settings: Length = 300, Temperature = 0.6 (Medium)
warwick: and, thou nast the hearth by the createred to the daughter, that he word the great enrome; that what then; if thou discheak, sir. clown: sir i have hath prance it beheart like!
Settings: Length = 300, Temperature = 0.6 (More Experimental)
i hear him speak. what! can so young a thordo, word appeal thee, but they go prife with recones, i thou dischidward! has thy noman, lookly comparmal to had ester, and, my seatiby bedath, romeo, thou lauke be; how will i have so the gioly beget discal bone. clown: i have seemitious in the step--by this low,
When you consider that our model doesn’t even know what a word or a sentence is, those results are quite impressive. Obviously they are far from perfect and there are some language models out there that are way more sophisticated and effective than this one.
However, what you also need to keep in mind here is that our model is very minimal with just one LSTM layer. Also, the results you see above are from a model that was only trained on a subset of the data. There is a lot of room for improvement. We can definitely increase the performance by adding more layers and training the neural network on the whole dataset.
I hope you enjoyed this little AI project on text generation. If so I would appreciate your clap.
If you are interested in more content from my side, you can check out my social media accounts, my books or my YouTube channel.