TokenFormer

Visit Tool

TokenFormer is an academic research tool that rethinks Transformer scaling with tokenized model parameters. It offers a fully attention-based neural network for enhanced architectural flexibility.

Claim this tool

1View

At a glance

Pricing

Open Source

Free tier

Yes

API

Skill level

Technical

About

What is TokenFormer?

TokenFormer is the official implementation of the ICLR2025 Spotlight paper, "TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters." This tool introduces a fully attention-based neural network that unifies token-token and token-parameter interactions, maximizing the flexibility of neural network architectures. By tokenizing both data and model parameters, TokenFormer inherently enhances model scalability, allowing for progressively efficient scaling. The architecture is designed to be natively scalable, leveraging attention mechanisms for interactions between input tokens, and between tokens and model parameters. This approach aims to offer greater flexibility than traditional Transformers, contributing to advancements in foundation models, sparse inference (MoE), parameter-efficient tuning, device-cloud collaboration, and vision-language applications.

Best used for

Ideal for professors and researchers who need to implement and experiment with advanced Transformer architectures, scale AI models incrementally, and explore novel attention mechanisms. Especially valuable for those working on foundation models, sparse inference, and parameter-efficient tuning.

Common actions

develop neural networks

scale AI models

research Transformer architectures

implement attention mechanisms

low-code/no-codeworkflowsautomated workflowcollaborationdeepfakeopen-source"AI Agents"face swappinggithub copilot

Capabilities

Key features

Tokenized model parameters
Fully attention-based network
Incremental model scaling
Language modeling benchmarks
Visual modeling benchmarks
PyTorch and Jax/TPU support

Target Audience

professor

Integrations

Not yet documented

Pricing & Plans

Open Source

Free

FAQs

What is the core innovation of TokenFormer?

TokenFormer introduces a fully attention-based neural network that tokenizes both data and model parameters. This approach unifies token-token and token-parameter interactions, maximizing architectural flexibility and enabling inherent model scalability beyond traditional Transformers.

What kind of scaling does TokenFormer support?

TokenFormer supports incremental model scaling, allowing larger Transformer architectures to be built upon smaller, previously trained models. This significantly reduces the overall cost and complexity associated with training very large models from scratch.

What benchmarks has TokenFormer been tested on?

TokenFormer has been evaluated on language modeling tasks using the Pile dataset with zero-shot evaluation, demonstrating competitive performance. It also includes plans for visual modeling benchmarks, specifically ImageNet-1K classification and DataComp-1B using a CLIP approach.

Trending

Subcategories trending in Research & Education

Study Assistants Knowledge Management Course Creation Scientific Computing Summarization Language Learning

Trending

Also listed in

This tool also appears in

Coding & Development › Open Source & Models

Explore

Browse AI tools by category

Content & Design Productivity & Business Coding & Development AI Agents & Automation Research & Education Wellness & Lifestyle Career Development Marketing & Growth Data & Analytics Customer Support & CX Finance E-commerce