One of the approach is to return the most recent messages in each run to avoid hitting token limits.
To support this point, here is my optimized reference of code that uses the GPT model and not exceeds token limits.

This example estimates tokens roughly by word count( len( prompt. split())). If the prompt and the desired response exceed the model's token limit[token_limit], the prompt is trimmed to fit within the limit. Therefore, you can use this method to optimize Chatgpt 3/4 API usage.
Hence, in this way, you will be able to avoid token overflow and maintain concise responses within the token limit.
If you're looking to improve your project management skills, you might wonder, what is PRINCE2? It’s a structured methodology widely used in various industries. Enrolling in a PRINCE2 course can provide essential knowledge for managing successful projects effectively.