You can optimize memory usage when deploying large generative models by referring to the following:
In this reference code techniques like Quantization, Activation Checkpointing and mixed precision are used to optimize the memory usage when deploying generative models.