Hey, thanks for the response. This is mainly because the model uses uni-grams for training. By this I mean that the model is trained one word per word. So it does not matter what the remaining contents are as the next word will be predicted on the last word. Although, we can use bi-grams and tri-grams to achieve higher accuracies by modifying the dataset. Then you can use the last two words or the last three words for the predictions. Feel free to let me know if you have any other queries. Thank you.