Quick Start Guide For LTX-2 In ComfyUI

At CES 2026, Lightricks released the highly anticipated open weights of the LTX-2 audio-video model, marking a major step forward in AI video and audio generation. Optimized for NVIDIA GPUs, LTX-2 is the leading open-weights audio-video model, capable of generating clips of up to 4K resolution, 50 FPS, and up to 20 seconds long.

The models are now available for download in BF16 precision. The base model is also available in quantized NVFP8 weights that cut model size by roughly 30% and can deliver up to 2x faster performance on RTX GPUs.

This guide gets you running with an RTX-optimized ComfyUI workflow in minutes.

LTX-2 Audio-Video Model

LTX-2 is a family of audio-video models that generate videos with audio. There are five checkpoints coming at launch:

  • Base: the standard text-to-video or image-to-video versatile generator. Trainable and customizable.
  • 8-step: a distilled version of the model that allows for fast iteration for idea exploration.
  • Camera control LoRA: a set of checkpoints that give precise control over camera movement.
  • Latent upsampler: useful for multiscale pipelines that get the highest quality faster.
  • IC-LoRAs: Depth, canny and pose LoRAs to give more control over specific compositional elements.

Quick Start

  1. Install ComfyUI or update to the latest version from ComfyUI.org.
  2. Open the Template Browser, navigate to Video and download your desired variant of LTX-2. 
    • For LTX-2 base, make sure you select NVFP8 if you have an NVIDIA GeForce RTX 40 Series, RTX Pro Ada Generation, a DGX Spark or higher.
  3. Recommended Settings:
    • On 24GB+ GPUs, we recommend using 720p24, 4-second clips with 20 steps.
    • On 8-16GB GPUs, we recommend using 540p24, 4-second clips with 20 steps.

Optimizing VRAM Usage

As a frontier model, LTX-2 uses significant amounts of video memory (VRAM) to deliver quality results. Memory use goes up as we increase resolution, framerate, length, or steps. Fortunately for users, ComfyUI and NVIDIA have collaborated to optimize a weight streaming feature, allowing users to offload parts of the workflow to system memory if your GPU runs out of VRAM, but this will come at a cost in performance. 

Depending on your GPU and use case, you may want to constrain these factors to ensure reasonable generation times. For example, GeForce RTX 5090 GPUs have 32GB of VRAM, and can generate a 720p 24fps 4-second clip within GPU memory in about 25 seconds. However, if a user wants a longer 8-second video, the generation time will increase to three minutes because it will require more than 32GB of VRAM and automatically engage weight streaming. 

Recommendation: use lower settings to iterate on your video, then increase the settings to tune the quality to what you want. In our experience it’s best to:

  • Decrease the length of the video down to 4 seconds (16GB+) or 3 seconds (12GB+).
  • Then decrease the resolution down to 720p (16GB+) or 540p (12GB+).
  • If your video does not require motion, decrease framerate down to 15 FPS.

Optimizing Quality

LTX-2 is an advanced model capable of generating amazing videos. But as with any model, tweaking the settings will have a big impact on quality. The community will come up with fantastic recommendations as the model weights become available, but here are pro tips we’ve found help the most in our testing:

  • Resolution: The highest quality is typically achieved at 1080p.
  • Frame Rate:
    • Motion videos highly benefit from higher FPS. We see better results going up to 50 FPS, even if that requires us to reduce the resolution to get good generation times.
    • Static videos, such as close ups of a person or an object, can typically work at 15 FPS.
  • Text-to-Image vs Image-to-Image: Providing a high quality input image typically improves quality of the output, as it provides clear visual guidance into the first frames while the prompted motion is not overly complex. A complicated movement without a clear reference or instruction can cause the clip to unexpectedly degrade after a few frames.
  • Steps: In our testing, 20 steps was the sweet spot between performance and quality, but going up to 30 steps and beyond should increase quality.