• Published on

    Pika Labs has released Pika 1.5 that introduces "Pikaffects" - a suite of six transformation effects including Inflate, Explode, Crush, Melt, Squish and "Cake-ify".

    Key technical improvements include enhanced physics simulations, longer video clip generation capabilities, and improved realism in character animations and movements.

    The platform now offers advanced cinematic camera controls like Bullet Time, Crash Zoom, Whip Pan and Crane shots.

  • Published on

    Kling, developed by the Kuaishou AI Team, is an AI video generation model capable of creating high-quality videos up to two minutes long at 30fps.

    Kling utilises a 3D spatio-temporal joint attention mechanism to model complex motion while maintaining physical accuracy. The model can generate 1080p resolution videos and variable aspect ratio output. The ability to transform static images into 5-second animated sequences.

    The system allows users to control video generation through text prompts and can automatically extend existing videos by an additional 4.5 seconds. Furthermore, Kling supports consecutive video extensions, enabling the creation of videos up to 3 minutes in length.

    The model is available through Kuaishou's platform and Fal.

  • Published on

    Stability AI has released Stable Video 3D, which generates 3D model videos from single images without requiring additional data.

    The technology builds on the company’s earlier release of Stable Video Diffusion, which could be used for a variety of tasks, and is now being used commercially with a Stability AI membership.

    The company claims Stable Video 3D outperforms other open-source alternatives, such as Zero123-XL, and features two variants for generating orbital videos and 3D video along specified camera paths.

    It can also create novel multi-view videos of an object and generate 3D meshes.

  • Published on

    Sora is a new AI-powered tool that creates videos from text prompts.

    It can currently generate videos up to a minute long that include complex scenes, specific types of motion and vibrant, accurate details.

    It achieves this by combining a deep understanding of language with a knowledge of how the physical world works in motion.

    The tool is still in development and researchers are seeking feedback from visual artists, designers and filmmakers to help improve it.

    Currently, Sora has some limitations and may struggle with simulating complex physics and interpreting some prompts with precise descriptions of events.

  • Published on

    Facebook AI Research has created a tool called DensePose, which can apply to videos and generate colour-coded representations of the human figure, labelling each body part.

    Flode-Labs has now created Vid2DensePose, a tool based on DensePose designed to convert videos into this format for use in animation.

    It is particularly useful in conjunction with MagicAnimate, an application that can take a series of labelled frames and generate smooth, animated transitions between them, enabling the creation of advanced, realistic human animations from still images.

    Vid2DensePose is available on GitHub, and includes instructions for installation and use.

  • Published on

    MagicAnimate is a framework for generating new animation data given a reference image and a motion sequence.

    The authors use a conditional diffusion model for generation, with an additional encoder used to preserve the identity of the reference image in the output.

    A simple but effective method of producing smooth transitions between video frames is also introduced, which is necessary for producing animations of reasonable length.

    Comparisons with other state-of-the-art methods on two benchmark datasets show that MagicAnimate produces more temporally consistent animations, while also better preserving the appearance of the reference image.

    The method performs well on both short and long animations, and when animating reference images with different identities to the motion sequence, showing the robustness and versatility of the approach.

  • Published on

    Microsoft Research Asia scientists have developed GAIA, a method for generating talking avatars from a single portrait image and a speech sample.

    Previous avatar generation methods used domain-specific heuristics such as warping-based motion representation and 3D Morphable Models, which limit the diversity and realism of the results.

    GAIA uses a two-stage process, first disentangling the input video into motion and appearance representations, and then generating a motion sequence from the speech and portrait reference.

    The researchers trained the system on a large-scale, high-quality talking avatar dataset, and the resulting avatar generator was shown to be superior to existing methods in terms of naturalness, diversity, lip-sync quality and visual quality.

    Furthermore, the system is scalable, general and can be used for other applications such as generating avatars from textual instructions.

  • Published on

    Character animation, or the task of creating the illusion of movement in otherwise static images, is an important and challenging aspect of computer graphics.

    In a paper released on arXiv, a group of researchers from the Alibaba Group detail a new method for training a character animation model using a type of neural network called a diffusion model. Their method, which they have called Animate Anyone, preserves the intricate appearance details of the reference image and uses a technique called spatial attention to merge detail features.

    It also uses a technique called pose guiding to direct the character’s movement and an approach called temporal modelling to ensure smooth transitions between frames in the resulting animation.

    In testing, the researchers used datasets of fashion photos and human dance videos to demonstrate that their method out-performed existing approaches and could generate realistic animations from images of different characters.

  • Published on

    Researchers from Shanghai AI Lab and Stanford University have presented an approach that enables the extraction of camera trajectory and character motion from filmed content for replication in new 2D or 3D content.

    The method enables the preservation of complex camera movements and character motion from the original shot and simulates new scenes with different characters, lighting, or environments.

    The researchers have demonstrated the technology through a series of video examples showcasing 2D and 3D cinematic transfers of filmed content ranging from Hollywood movies to animated features, with accurate preservation of camera movements and character motions.

    The team has made their code and datasets publicly available.

  • Published on

    Video creation AI company Pika has announced Pika 1.0, a new AI model which generates and edits videos in several styles, including 3D animation and cinema.

    The new web experience also aims to improve usability.

    Pika 1.0 marks the company’s efforts to fulfil its vision of enabling everyone to be a director of their own content.

    Having started six months ago, the company has grown to half a million users generating millions of videos weekly. Pika has also announced it has raised $55m in funding, including investment from Elad Gil and Adam D’Angelo.

  • Published on

    Stability AI has released Stable Video Diffusion, its first foundation model for generative video, based on the image model Stable Diffusion.

    Currently available in research preview, the company has made the code available on GitHub and the weights available on the Hugging Face page.

    Two image-to-video models are available, generating 14 and 25 frames at customisable frame rates.

    The model is for research purposes only and is not yet intended for commercial use.