Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

timm

PyTorch Image Models

Rank: #1147Downloads: 10,616,676 (30 days)

Description

PyTorch Image Models

What's New

Feb 23, 2026

  • Add token distillation training support to distillation task wrappers
  • Remove some torch.jit usage in prep for official deprecation
  • Caution added to AdamP optimizer
  • Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights
  • Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor)
  • Release 1.0.25

Jan 21, 2026

  • Compat Break: Fix oversight w/ QKV vs MLP bias in ParallelScalingBlock (& DiffParallelScalingBlock)
    • Does not impact any trained timm models but could impact downstream use.

Jan 5 & 6, 2026

  • Release 1.0.24
  • Add new benchmark result csv files for inference timing on all models w/ RTX Pro 6000, 5090, and 4090 cards w/ PyTorch 2.9.1
  • Fix moved module error in deprecated timm.models.layers import path that impacts legacy imports
  • Release 1.0.23

Dec 30, 2025

Dec 12, 2025

Dec 1, 2025

  • Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
  • Remove old APEX AMP support

Nov 4, 2025

Oct 31, 2025 🎃

  • Update imagenet & OOD variant result csv files to include a few new models and verify correctness over several torch & timm versions
  • EfficientNet-X and EfficientNet-H B5 model weights added as part of a hparam search for AdamW vs Muon (still iterating on Muon runs)

Oct 16-20, 2025

  • Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations
    • extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization
    • small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops
    • by default uses AdamW (or NAdamW if nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag)
    • like torch impl, select from several LR scale adjustment fns via adjust_lr_fn
    • select from several NS coefficient presets or specify your own via ns_coefficients
  • First 2 steps of 'meta' device model initialization supported
    • Fix several ops that were breaking creation under 'meta' device context
    • Add device & dtype factory kwarg support to all models and modules (anything inherting from nn.Module) in timm
  • License fields added to pretrained cfgs in code
  • Release 1.0.21

Sept 21, 2025

  • Remap DINOv3 ViT weight tags from lvd_1689m -> lvd1689m to match (same for sat_493m -> sat493m)
  • Release 1.0.20

Sept 17, 2025

July 23, 2025

  • Add set_input_size() method to EVA models, used by OpenCLIP 3.0.0 to allow resizing for timm based encoder models.
  • Release 1.0.18, needed for PE-Core S & T models in OpenCLIP 3.0.0
  • Fix small typing issue that broke Python 3.9 compat. 1.0.19 patch release.

July 21, 2025

  • ROPE support added to NaFlexViT. All models covered by the EVA base (eva.py) including EVA, EVA02, Meta PE ViT, timm SBB ViT w/ ROPE, and Naver ROPE-ViT can be now loaded in NaFlexViT when use_naflex=True passed at model creation time
  • More Meta PE ViT encoders added, including small/tiny variants, lang variants w/ tiling, and more spatial variants.
  • PatchDropout fixed with NaFlexViT and also w/ EVA models (regression after adding Naver ROPE-ViT)
  • Fix XY order with grid_indexing='xy', impacted non-square image use in 'xy' mode (only ROPE-ViT and PE impacted).

July 7, 2025

  • MobileNet-v5 backbone tweaks for improved Google Gemma 3n behaviour (to pair with updated official weights)
    • Add stem bias (zero'd in updated weights, compat break with old weights)
    • GELU -> GELU (tanh approx). A minor change to be closer to JAX
  • Add two arguments to layer-decay support, a min scale clamp and 'no optimization' scale threshold
  • Add 'Fp32' LayerNorm, RMSNorm, SimpleNorm variants that can be enabled to force computation of norm in float32
  • Some typing, argument cleanup for norm, norm+act layers done with above
  • Support Naver ROPE-ViT (https://github.com/naver-ai/rope-vit) in eva.py, add RotaryEmbeddingMixed module for mixed mode, weights on HuggingFace Hub
modelimg_sizetop1top5param_count
vit_large_patch16_rope_mixed_ape_224.naver_in1k22484.8497.122304.4
vit_large_patch16_rope_mixed_224.naver_in1k22484.82897.116304.2
vit_large_patch16_rope_ape_224.naver_in1k22484.6597.154304.37
vit_large_patch16_rope_224.naver_in1k22484.64897.122304.17
vit_base_patch16_rope_mixed_ape_224.naver_in1k22483.89496.75486.59
vit_base_patch16_rope_mixed_224.naver_in1k22483.80496.71286.44
vit_base_patch16_rope_ape_224.naver_in1k22483.78296.6186.59
vit_base_patch16_rope_224.naver_in1k22483.71896.67286.43
vit_small_patch16_rope_224.naver_in1k22481.2395.02221.98
vit_small_patch16_rope_mixed_224.naver_in1k22481.21695.02221.99
vit_small_patch16_rope_ape_224.naver_in1k22481.00495.01622.06
vit_small_patch16_rope_mixed_ape_224.naver_in1k22480.98694.97622.06
  • Some cleanup of ROPE modules, helpers, and FX tracing leaf registration
  • Preparing version 1.0.17 release

June 26, 2025

  • MobileNetV5 backbone (w/ encoder only variant) for Gemma 3n image encoder
  • Version 1.0.16 released

June 23, 2025

  • Add F.grid_sample based 2D and factorized pos embed resize to NaFlexViT. Faster when lots of different sizes (based on example by https://github.com/stas-sl).
  • Further speed up patch embed resample by replacing vmap with matmul (based on snippet by https://github.com/stas-sl).
  • Add 3 initial native aspect NaFlexViT checkpoints created while testing, ImageNet-1k and 3 different pos embed configs w/ same hparams.
ModelTop-1 AccTop-5 AccParams (M)Eval Seq Len
naflexvit_base_patch16_par_gap.e300_s576_in1k83.6796.4586.63576
naflexvit_base_patch16_parfac_gap.e300_s576_in1k83.6396.4186.46576
naflexvit_base_patch16_gap.e300_s576_in1k83.5096.4686.63576
  • Support gradient checkpointing for forward_intermediates and fix some checkpointing bugs. Thanks https://github.com/brianhou0208
  • Add 'corrected weight decay' (https://arxiv.org/abs/2506.02285) as option to AdamW (legacy), Adopt, Kron, Adafactor (BV), Lamb, LaProp, Lion, NadamW, RmsPropTF, SGDW optimizers
  • Switch PE (perception encoder) ViT models to use native timm weights instead of remapping on the fly
  • Fix cuda stream b