• Published on

    Microsoft Research Asia scientists have developed GAIA, a method for generating talking avatars from a single portrait image and a speech sample.

    Previous avatar generation methods used domain-specific heuristics such as warping-based motion representation and 3D Morphable Models, which limit the diversity and realism of the results.

    GAIA uses a two-stage process, first disentangling the input video into motion and appearance representations, and then generating a motion sequence from the speech and portrait reference.

    The researchers trained the system on a large-scale, high-quality talking avatar dataset, and the resulting avatar generator was shown to be superior to existing methods in terms of naturalness, diversity, lip-sync quality and visual quality.

    Furthermore, the system is scalable, general and can be used for other applications such as generating avatars from textual instructions.