Google AI researchers introduce HyperDreamBooth: an AI approach that efficiently generates personalized weights from just a single image of a person is smaller and 25x faster than DreamBooth

0
8

The field of generative Artificial Intelligence is getting all the attention it deserves. Recent developments in text-to-image (T2I) personalization has opened up intriguing possibilities for innovative uses. The concept of personalization, which is the generation of distinctive persons in varied contexts and styles while preserving a high level of integrity to their identities, has become a prominent topic in generative AI. Face personalization, the ability to generate variously styled new photos of a certain face or person, has been made possible by utilizing pre-trained diffusion models, which have strong priors on various styles. 

Current approaches like DreamBooth and comparable techniques succeed because of their ability to include new subjects into the model without detracting from its past knowledge and maintain the essence and specifics of the subject even when presented in widely different ways. But it still comes with a lot of limitations, including issues with the size of the model and its training speed. DreamBooth involves finetuning all the weights of the UNet and Text Encoder of the diffusion model, leading to a size of over 1GB for stable diffusion, which is significantly large. Also, the training procedure for Stable Diffusion takes around 5 minutes, which may prevent its widespread adoption and practical application.

To overcome all these issues, a team of researchers from Google Research has introduced HyperDreamBooth, which is a hypernetwork that efficiently generates a small set of personalized weights from just a single image of a person. With just a single image of a person, HyperDreamBooth’s hypernetwork effectively creates a tiny collection of personalized weights. The diffusion model is then coupled with these unique weights, which goes through quick tweaking. The end result is a powerful system that can generate a person’s face in a variety of situations and aesthetics while maintaining fine topic details and the diffusion model’s essential understanding of various aesthetics and semantic alterations.

🚀 Build high-quality training datasets with Kili Technology and solve NLP machine learning challenges to develop powerful ML applications

The incredible speed of HyperDreamBooth is one of its greatest accomplishments. It is 25 times faster than DreamBooth and an astonishing 125 times faster than another related technology called Textual Inversion to personalize faces in just 20 seconds. Moreover, while keeping the same degree of quality and aesthetic variation as DreamBooth, this quick customization procedure only needs one reference image. HyperDreamBooth also excels in terms of model size in addition to speed. The resulting personalized model is 10,000 times smaller than a regular DreamBooth model, which is a substantial advantage, as it makes the model more manageable and reduces the storage requirements significantly.

The team has summarized their contributions as follows: 

  1. Lightweight DreamBooth (LiDB): A personalized text-to-image model with a customized part of approximately 100KB has been introduced, which has been achieved by training the DreamBooth model in a low-dimensional weight-space generated by a random orthogonal incomplete basis within a low-rank adaptation weight space. 
  1. New HyperNetwork architecture: Using LiDB’s configuration, HyperNetwork generates customized weights for specific subjects in a text-to-image diffusion model. This provides a strong directional initialization, enabling fast finetuning for achieving high subject fidelity within a few iterations. This method is 25 times faster than DreamBooth with comparable performance.
  1. Rank-relaxed finetuning: The technique of rank-relaxed finetuning has been proposed, relaxing the rank of a LoRA DreamBooth model during optimization to enhance subject fidelity. This enables initialization of the personalized model with an initial approximation from the HyperNetwork and then refining high-level subject details using rank-relaxed fine-tuning.

Check out the Paper and Project Page. Don’t forget to join our 26k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

🚀 Check Out 800+ AI Tools in AI Tools Club


Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.


🔥 StoryBird.ai just dropped some amazing features. Generate an illustrated story from a prompt. Check it out here. (Sponsored)

Credit: Source link