I've had a fun time following the buzz about Stable Diffusion (SD), but I've been itching to get my hands dirty with it.
This weekend, I put in a lot of hours to get a
flake.nix working for SD. It
was rough, but I'm happy now that I'm on the other side of it. I didn't manage
to get it 100% declarative, but almost. I've probably also committed 100 sins
in terms of what you shouldn't do when packaging stuff with Nix, but hey - it
You can find the repository here.
- You need an NVIDIA GPU with a sufficient amount of memory. For details on how to set this up on your NixOS device, see Xe's blog recent blog post.
- You need Nix flakes enabled.
Once you're able to run
nvidia-smi and see your GPU, do the following:
git clone firstname.lastname@example.org:skogsbrus/stable-diffusion-nix-flake.git cd stable-diffusion-nix-flake nix develop
And that's it! But there's a big BUT: this will take a couple of hours (depends on your HW) since you have to compile some packages, like OpenCV and PyTorch, with CUDA support. It's probably possible to reduce some of the compilation time, but I don't care much at this point since I've already got the compiled result... Sorry!
Once the installation completes, you should download the model weights from here.
mv ~/Downloads/sd-v1-4.ckpt ../sd-data/sd-v1-4.ckpt ln -s ../sd-data/sd-v1-4.ckpt models/ldm/stable-diffusion-v1/model.ckpt
After that you can start generating images:
python3 optimizedSD/optimized_txt2img.py --H 512 --W 768 --n_iter 1 --n_samples 4 --ddim_steps 50 --prompt "Relativity (M.C. Escher), psychadelic, with walking dogs"
- It's absolutely bonkers how far we've come with image generation. In my Master's thesis, 2019, we used different variations of CycleGAN to create synthetic training data (didn't succeed). It could generate 256x256px images of blurry zebras (with horses as input). Today we can render 4K images of anything. Can't wait to follow the progress on video for the next couple of years.
- Thank you Nixpkgs maintainers. I really appreciate the work you do. (Whenever I do a deep dive like this I am humbled by the amount of work to get dependencies installed correctly)
Some favorites so far
Monk with rice hat begging on the street, neon light from shops, cyborgs walking past, cinematic
Russian tank on fire, simon stålenhag, animated, dystopia
Robot playing chess, simon stålenhag, animated, cinematic, sci-fi future
Breath of the wild, Link on a surfboard, simon stålenhag
Gandalf the gray, epic scenery, miyasaki 4k
Gandalf the gray, breath of the wild
Turning torso, cinematic, malmö harbor 4k
M.C. Escher mixed with Salvador Dali, impressionist
Renaissance painting of a security guard with a security camera on his head
Cyborg monk, solarpunk, sci-fi, triadic colors, neon light, cinematic, 4k, farm