I've had a fun time following the buzz about Stable Diffusion (SD), but I've been itching to get my hands dirty with it.

This weekend, I put in a lot of hours to get a flake.nix working for SD. It was rough, but I'm happy now that I'm on the other side of it. I didn't manage to get it 100% declarative, but almost. I've probably also committed 100 sins in terms of what you shouldn't do when packaging stuff with Nix, but hey - it works!

You can find the repository here.

Prerequisites

  • You need an NVIDIA GPU with a sufficient amount of memory. For details on how to set this up on your NixOS device, see Xe's blog recent blog post.
  • You need Nix flakes enabled.

Set up

Once you're able to run nvidia-smi and see your GPU, do the following:

git clone git@github.com:skogsbrus/stable-diffusion-nix-flake.git
cd stable-diffusion-nix-flake
nix develop

And that's it! But there's a big BUT: this will take a couple of hours (depends on your HW) since you have to compile some packages, like OpenCV and PyTorch, with CUDA support. It's probably possible to reduce some of the compilation time, but I don't care much at this point since I've already got the compiled result... Sorry!

Once the installation completes, you should download the model weights from here.

mv ~/Downloads/sd-v1-4.ckpt ../sd-data/sd-v1-4.ckpt
ln -s ../sd-data/sd-v1-4.ckpt models/ldm/stable-diffusion-v1/model.ckpt

After that you can start generating images:

python3 optimizedSD/optimized_txt2img.py --H 512 --W 768 --n_iter 1 --n_samples 4 --ddim_steps 50 --prompt "Relativity (M.C. Escher), psychadelic, with walking dogs"

Key takeaways

  • It's absolutely bonkers how far we've come with image generation. In my Master's thesis, 2019, we used different variations of CycleGAN to create synthetic training data (didn't succeed). It could generate 256x256px images of blurry zebras (with horses as input). Today we can render 4K images of anything. Can't wait to follow the progress on video for the next couple of years.
  • Thank you Nixpkgs maintainers. I really appreciate the work you do. (Whenever I do a deep dive like this I am humbled by the amount of work to get dependencies installed correctly)

Some favorites so far

Monk with rice hat begging on the street, neon light from shops, cyborgs walking past, cinematic

Russian tank on fire, simon stålenhag, animated, dystopia

Robot playing chess, simon stålenhag, animated, cinematic, sci-fi future

Breath of the wild, Link on a surfboard, simon stålenhag

Gandalf the gray, epic scenery, miyasaki 4k

Gandalf the gray, breath of the wild

Turning torso, cinematic, malmö harbor 4k

M.C. Escher mixed with Salvador Dali, impressionist

Renaissance painting of a security guard with a security camera on his head

Cyborg monk, solarpunk, sci-fi, triadic colors, neon light, cinematic, 4k, farm