Harmonic Innovation: My Journey with Stable Audio
First impressions on the much awaited audio generation model from the makers of Stable Diffusion.
Introduction
In the evolving landscape of music production, I've had the opportunity to explore numerous AI models. Stable Audio, which is now accessible to the public since September 2023, was a tool I had the privilege of delving into while it was still in beta. It represents not just a leap in generative AI, which has broad applications from images to videos, but also a beacon in the ongoing journey of audio innovation. Stable Audio distinguishes itself with several standout features. The unique duration feature, BPM prompting, and the ability for stem-only (or solo instrument) generation offer unmatched creative freedom. It also excels in generating sound effects and allows for natural language prompting, enabling the use of adjectives and descriptive words to shape the musical output. These features collectively provide creators the flexibility to produce concise musical snippets as well as foundational starting points for more elaborate compositions.
lofi house, beautiful rhodes, deep house, tape degradation, casette tape, vinyl, melodic, 118bpm
The Journey Begins: Initial Encounters
Having been a privileged beta tester with early access to Stable Audio, my journey with this tool was framed against my prior interactions with audio generating models like Google's MusicLM, Riffusion, and Dance Diffusion. While these models showcased potential, there was an element that was lacking. Stable Audio brought its own set of strengths, with its ability to generate cohesive, coherent, and high-quality audio. However, like all tools, it had its moments of unpredictability. For instance, some generations didn't quite capture the sound I envisioned, indicating a potential need for refined prompt engineering on my end. In a dub reggae track I generated, I noticed the delays and echoes seemed slightly off. Moreover, nuances in subgenres posed challenges; a request for "Happy Hardcore"(an electronic music subgenre) yielded a "Hardcore" rock track. And while generating a "spoken word" sample led to some intriguing results, I wondered about the model's suitability for such tasks. Regardless, each interaction showed massive potential for music makers and producers, highlighting Stable Audio's capabilities in a domain brimming with creative possibilities.
a male voice with layers of echo, delay and reverb
dub reggae in the style of augustus pablo, 70s, tape delay, dub echo, deep bass, loud drums
Happy hardcore, 90s, grimey, hard, heavy, 180bpm
Exploring Sounds: Diverse Experiments with Stable Audio
The allure of Stable Audio lies not just in its familiar outputs, but in the vast possibilities it presents. My experiments with arpeggio synth sounds and cumbia percussion loops showcased its ease in producing varied musical elements. However, my journey went beyond mere genre replication. In tasks involving full tracks, cross-genres, and stems and samples, Stable Audio showcased its capabilities, albeit with some occasional hiccups.
Full Tracks:
A hybrid track, vaporwave and dnb, 175bpm
80s jazz piano, cassette tape degredation, lofi, beautiful chords, 80bpm
A psychedelic cumbia track, 70s, guitar leads, loud percussions, cowbell, 90bpm
Stems and Samples
80s jazz piano only, no drums, cassette tape degredation, lofi, beautiful chords, 80bpm
one shot kick drum sounds with silence in between. 70s style kick drum, muffled, fast decay, punchy, loud, compressed, saturated.
arpeggio harp only, pluck sound, A minor, beautiful, calm, reverb, 110bpm
A cumbia percussion loop only, 80bpm, cowbells, guiro, conga
Ambient Creations: Beyond Music
One of the intriguing capabilities of Stable Audio lies in its versatility beyond music generation. My exploration into generating ambient/environmental sounds, such as dogs barking or street noise, highlighted its potential as a sound design tool. These ambient creations felt authentic, and I can envision applications in video and videogame production, adding immersive soundscapes to scenes.
foley, keys jangling, kids at the park, dogs barking
Challenges and Wishes
No tool is without its quirks, and Stable Audio is no exception. Its beginner-friendly interface, while a boon for novices, left me yearning for more granular control. My journey wasn't without hiccups, like the one-time system freeze, but these teething issues are expected in early versions. Looking forward, I envisage a Stable Audio replete with advanced settings: prompt guidance, temperature tunings, and even batch options. My wishlist extends to stem separation, MIDI conversion, advanced features like LoRA (fine-tuning), continuing a previous generation and an API. But what truly excites me is the prospect of local installation.
Technical Brilliance and Future Horizons
Stable Audio is not just a marvel of audio generation; it's a symphony of technical ingenuity. From its use of a variational autoencoder (VAE) to the incorporation of the Descript Audio Codec, it stands as a testament to the prowess of Stability AI’s generative audio research lab, Harmonai. And as they continue to refine and evolve, the future looks promising, with open-source models and training codes on the horizon. (edit: Stable Audio has released training codes on October 11, 2023.)
Conclusion
As I reflect upon my odyssey with Stable Audio, it's clear that we're in the midst of a paradigm shift in music creation. Every note, every experiment, has added a new layer to my understanding of what AI can bring to the table. But the question remains: How will these tools redefine the essence of music in the years to come? As I continue my journey, I'm fueled by excitement and a desire to be at the forefront of this musical revolution. 🌀
Just for kicks, I made this quick and dirty track using samples generated from Stable Audio with some minor editing and processing on my end :)
If you haven't tried Stable Audio yet, give it a spin here. Need a user guide? Go here.
Special thanks to CJ (Dadabots) and the rest of the Harmonai team for granting me beta access for this review.