timm
PyTorch Image Models
Rank: #1147Downloads: 10,616,676 (30 days)
Description
PyTorch Image Models
- What's New
- Introduction
- Models
- Features
- Results
- Getting Started (Documentation)
- Train, Validation, Inference Scripts
- Awesome PyTorch Resources
- Licenses
- Citing
What's New
Feb 23, 2026
- Add token distillation training support to distillation task wrappers
- Remove some torch.jit usage in prep for official deprecation
- Caution added to AdamP optimizer
- Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights
- Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor)
- Release 1.0.25
Jan 21, 2026
- Compat Break: Fix oversight w/ QKV vs MLP bias in
ParallelScalingBlock(&DiffParallelScalingBlock)- Does not impact any trained
timmmodels but could impact downstream use.
- Does not impact any trained
Jan 5 & 6, 2026
- Release 1.0.24
- Add new benchmark result csv files for inference timing on all models w/ RTX Pro 6000, 5090, and 4090 cards w/ PyTorch 2.9.1
- Fix moved module error in deprecated timm.models.layers import path that impacts legacy imports
- Release 1.0.23
Dec 30, 2025
- Add better NAdaMuon trained
dpwee,dwee,dlittle(differential) ViTs with a small boost over previous runs - Add a ~21M param
timmvariant of the CSATv2 model at 512x512 & 640x640- https://huggingface.co/timm/csatv2_21m.sw_r640_in1k (83.13% top-1)
- https://huggingface.co/timm/csatv2_21m.sw_r512_in1k (82.58% top-1)
- Factor non-persistent param init out of
__init__into a common method that can be externally called viainit_non_persistent_buffers()after meta-device init.
Dec 12, 2025
- Add CSATV2 model (thanks https://github.com/gusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https://huggingface.co/Hyunil/CSATv2
- Add AdaMuon and NAdaMuon optimizer support to existing
timmMuon impl. Appears more competitive vs AdamW with familiar hparams for image tasks. - End of year PR cleanup, merge aspects of several long open PR
- Merge differential attention (
DiffAttention), add correspondingDiffParallelScalingBlock(for ViT), train some wee vits - Add a few pooling modules,
LsePlusandSimPool - Cleanup, optimize
DropBlock2d(also add support to ByobNet based models)
- Merge differential attention (
- Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10
Dec 1, 2025
- Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
- Remove old APEX AMP support
Nov 4, 2025
- Fix LayerScale / LayerScale2d init bug (init values ignored), introduced in 1.0.21. Thanks https://github.com/Ilya-Fradlin
- Release 1.0.22
Oct 31, 2025 🎃
- Update imagenet & OOD variant result csv files to include a few new models and verify correctness over several torch & timm versions
- EfficientNet-X and EfficientNet-H B5 model weights added as part of a hparam search for AdamW vs Muon (still iterating on Muon runs)
Oct 16-20, 2025
- Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations
- extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization
- small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops
- by default uses AdamW (or NAdamW if
nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag) - like torch impl, select from several LR scale adjustment fns via
adjust_lr_fn - select from several NS coefficient presets or specify your own via
ns_coefficients
- First 2 steps of 'meta' device model initialization supported
- Fix several ops that were breaking creation under 'meta' device context
- Add device & dtype factory kwarg support to all models and modules (anything inherting from nn.Module) in
timm
- License fields added to pretrained cfgs in code
- Release 1.0.21
Sept 21, 2025
- Remap DINOv3 ViT weight tags from
lvd_1689m->lvd1689mto match (same forsat_493m->sat493m) - Release 1.0.20
Sept 17, 2025
- DINOv3 (https://arxiv.org/abs/2508.10104) ConvNeXt and ViT models added. ConvNeXt models were mapped to existing
timmmodel. ViT support done via the EVA base model w/ a newRotaryEmbeddingDinoV3to match the DINOv3 specific RoPE impl - MobileCLIP-2 (https://arxiv.org/abs/2508.20691) vision encoders. New MCI3/MCI4 FastViT variants added and weights mapped to existing FastViT and B, L/14 ViTs.
- MetaCLIP-2 Worldwide (https://arxiv.org/abs/2507.22062) ViT encoder weights added.
- SigLIP-2 (https://arxiv.org/abs/2502.14786) NaFlex ViT encoder weights added via timm NaFlexViT model.
- Misc fixes and contributions
July 23, 2025
- Add
set_input_size()method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models. - Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
- Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.
July 21, 2025
- ROPE support added to NaFlexViT. All models covered by the EVA base (
eva.py) including EVA, EVA02, Meta PE ViT,timmSBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT whenuse_naflex=Truepassed at model creation time - More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
- PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
- Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).
July 7, 2025
- MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
- Add stem bias (zero'd in updated weights, compat break with old weights)
- GELU -> GELU (tanh approx). A minor change to be closer to JAX
- Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
- Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
- Some typing, argument cleanup for norm, norm+act layers done with above
- Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in
eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub
| model | img_size | top1 | top5 | param_count |
|---|---|---|---|---|
| vit_large_patch16_rope_mixed_ape_224.naver_in1k | 224 | 84.84 | 97.122 | 304.4 |
| vit_large_patch16_rope_mixed_224.naver_in1k | 224 | 84.828 | 97.116 | 304.2 |
| vit_large_patch16_rope_ape_224.naver_in1k | 224 | 84.65 | 97.154 | 304.37 |
| vit_large_patch16_rope_224.naver_in1k | 224 | 84.648 | 97.122 | 304.17 |
| vit_base_patch16_rope_mixed_ape_224.naver_in1k | 224 | 83.894 | 96.754 | 86.59 |
| vit_base_patch16_rope_mixed_224.naver_in1k | 224 | 83.804 | 96.712 | 86.44 |
| vit_base_patch16_rope_ape_224.naver_in1k | 224 | 83.782 | 96.61 | 86.59 |
| vit_base_patch16_rope_224.naver_in1k | 224 | 83.718 | 96.672 | 86.43 |
| vit_small_patch16_rope_224.naver_in1k | 224 | 81.23 | 95.022 | 21.98 |
| vit_small_patch16_rope_mixed_224.naver_in1k | 224 | 81.216 | 95.022 | 21.99 |
| vit_small_patch16_rope_ape_224.naver_in1k | 224 | 81.004 | 95.016 | 22.06 |
| vit_small_patch16_rope_mixed_ape_224.naver_in1k | 224 | 80.986 | 94.976 | 22.06 |
- Some cleanup of ROPE modules, helpers, and FX tracing leaf registration
- Preparing version 1.0.17 release
June 26, 2025
- MobileNetV5 backbone (w/ encoder only variant) for Gemma 3n image encoder
- Version 1.0.16 released
June 23, 2025
- Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https://github.com/stas-sl).
- Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https://github.com/stas-sl).
- Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w/ same hparams.
| Model | Top-1 Acc | Top-5 Acc | Params (M) | Eval Seq Len |
|---|---|---|---|---|
| naflexvit_base_patch16_par_gap.e300_s576_in1k | 83.67 | 96.45 | 86.63 | 576 |
| naflexvit_base_patch16_parfac_gap.e300_s576_in1k | 83.63 | 96.41 | 86.46 | 576 |
| naflexvit_base_patch16_gap.e300_s576_in1k | 83.50 | 96.46 | 86.63 | 576 |
- Support gradient checkpointing for
forward_intermediatesand fix some checkpointing bugs. Thanks https://github.com/brianhou0208 - Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers
- Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly
- Fix cuda stream b