- Published on
Microsoft Research Asia scientists have developed GAIA, a method for generating talking avatars from a single portrait image and a speech sample.
Previous avatar generation methods used domain-specific heuristics such as warping-based motion representation and 3D Morphable Models, which limit the diversity and realism of the results.
GAIA uses a two-stage process, first disentangling the input video into motion and appearance representations, and then generating a motion sequence from the speech and portrait reference.
The researchers trained the system on a large-scale, high-quality talking avatar dataset, and the resulting avatar generator was shown to be superior to existing methods in terms of naturalness, diversity, lip-sync quality and visual quality.
Furthermore, the system is scalable, general and can be used for other applications such as generating avatars from textual instructions.