
Long Convolutions for GPT-like Models: Polynomials, Fast Fourier ...
Dec 11, 2023 · Three options for what to do when multiplying polynomials, and what it means for the resulting convolution. Thus, to make fourier models GPT-like, we need to adopt the “make it longer” …
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor ...
Nov 13, 2023 · We propose FlashFFTConv, a new algorithm for efficiently computing the FFT convolution on GPUs. FlashFFTConv speeds up convolutions by up to 7.93x over PyTorch and …
Simple Long Convolutions for Sequence Modeling · Hazy Research
Feb 15, 2023 · In our new paper, we show that directly parameterizing the convolution kernel works surprisingly well – with a twist! We need to add a simple regularization, and then long convolutions …
Hyena Hierarchy: Towards Larger Convolutional Language Models
Mar 7, 2023 · The Hyena operator is defined as a recurrence (controlling layer size) of two efficient subquadratic primitives: an implicit long convolution (i.e. Hyena filters parameterized by a feed …
From Deep to Long Learning? · Hazy Research
Mar 27, 2023 · Turns out, two simple insights led us to the answer: Every SSM can be viewed as a convolution filter the length of the input sequence – so we can replace the SSM with a convolution …
Zoology (Blogpost 2): Simple, Input-Dependent, and Sub-Quadratic ...
Dec 11, 2023 · In our paper, we provably analyze our gated convolution layer showing it provably simulates all gated convolution architectures (H3, Hyena, RWKV, RetNet, etc.).
Monarchs and Butterflies: Towards Sub-Quadratic Scaling in Model ...
Dec 11, 2023 · Monarch matrices are also the same basic idea behind FlashFFTConv. Since Monarch matrices generalize the FFT and are hardware-efficient, they form a natural opportunity to speed up …
Efficient language models as arithmetic circuits · Hazy Research
Jun 22, 2024 · Using the polynomial view, we first prove that any gated convolution model (including H3, BiGS, Hyena, RWKV, M2, etc.) can be simulated by a single canonical representation, BaseConv, …
Long-Context Retrieval Models with Monarch Mixer
Jan 11, 2024 · We replace attention by using Monarch matrices to construct a gated long convolution layer, similar to work like H3, Hyena, GSS, and BiGS. Specifically, Monarch matrices can implement …
The Safari of Deep Signal Processing: Hyena and Beyond
Jun 8, 2023 · Spectrum of long convolution filters of Safari models (H3 and Hyena), alongside visualization at initialization and after pretraining. The decay rate depends on the reduction operator …