Replace tfds.load('imdb_reviews/subwords8k', ...) with tfds.load('imdb_reviews', ...), then manually create a tokenizer using SubwordTextEncoder.build_from_corpus on the training split, map this tokenizer over the dataset with tf.py_function to encode the text into integer IDs, and finally use padded_batch to handle variable-length sequences before feeding them into your model.