Skip to content

Fix model offloading and training tests + prevent examples timeout#14091

Merged
sayakpaul merged 3 commits into
huggingface:mainfrom
GiGiKoneti:fix/vae-and-examples-ci-failures
Jul 4, 2026
Merged

Fix model offloading and training tests + prevent examples timeout#14091
sayakpaul merged 3 commits into
huggingface:mainfrom
GiGiKoneti:fix/vae-and-examples-ci-failures

Conversation

@GiGiKoneti

Copy link
Copy Markdown
Contributor

What does this PR do?

This PR fixes three pre-existing bugs in model offloading, training tests, and example test runs that were causing CI failures:

  1. AutoencoderVidTok return format mismatch:

    • Modified AutoencoderVidTok.forward to return (dec,) when return_dict=False, aligning it with the standard VAE return contract in Diffusers.
    • Unskipped test_outputs_equivalence since the outputs match correctly now.
  2. AutoencoderDC mixed precision skip:

    • Added a try-except block to catch RuntimeError: "GET was unable to find an engine to execute this computation" and call pytest.skip when cuDNN cannot find matching computation engine configs.
  3. Examples timeout and distributed launch mitigation:

    • Refactored run_command in examples/test_examples_utils.py to use subprocess.run with a configurable timeout parameter (default 300s) to prevent tests from hanging indefinitely.
    • Appended --num_processes 1 --num_machines 1 to ExamplesTestsAccelerate launch arguments to prevent distributed launch deadlocks on single-device/CPU CI runners.

Fixes #14090

Before submitting

Who can review?

@sayakpaul @DN6 @pcuenca

Comment thread examples/test_examples_utils.py Outdated
Comment thread tests/models/autoencoders/test_models_autoencoder_dc.py Outdated
Comment thread tests/models/autoencoders/test_models_autoencoder_vidtok.py
@GiGiKoneti GiGiKoneti force-pushed the fix/vae-and-examples-ci-failures branch from f1b7e44 to 5c98abf Compare June 30, 2026 08:53
@github-actions github-actions Bot removed the examples label Jun 30, 2026
Comment thread tests/models/testing_utils/training.py Outdated
@GiGiKoneti GiGiKoneti force-pushed the fix/vae-and-examples-ci-failures branch from 71f7b15 to f16aa73 Compare July 1, 2026 11:41
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul

Copy link
Copy Markdown
Member

Failing tests are unrelated.

@sayakpaul sayakpaul merged commit cbdb637 into huggingface:main Jul 4, 2026
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix model offloading and training tests + prevent examples timeout

3 participants