Skynet Report

Microsoft Research Asia scientists have developed GAIA, a method for generating talking avatars from a single portrait image and a speech sample.

Previous avatar generation methods used domain-specific heuristics such as warping-based motion representation and 3D Morphable Models, which limit the diversity and realism of the results.

GAIA uses a two-stage process, first disentangling the input video into motion and appearance representations, and then generating a motion sequence from the speech and portrait reference.

The researchers trained the system on a large-scale, high-quality talking avatar dataset, and the resulting avatar generator was shown to be superior to existing methods in terms of naturalness, diversity, lip-sync quality and visual quality.

Furthermore, the system is scalable, general and can be used for other applications such as generating avatars from textual instructions.

Skynet Report

GAIA: Talking Avatar Generation