Note that I’m using the latest version as of when I wrote this: for you it’s probably going to be different.
Setting up NVIDIAĪ bit of prep: $ yum install nano $ sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)Ī couple posts also recommended switching SELINUX=disabled in /etc/sysconfig/selinux update to takedown the selinux firewall and speed up downloads.Ĭhange to root user and update system kernel $ sudo su $ dnf update $ rebootĭownload the NVIDIA drivers and make the executable. My last post about building a DL station has some decent pointers (I think).
Lots of good guides out there for how to put the hardware together if that’s the step you’re on. All I’ve done here is scrape together info from other peoples’ posts² - big thanks to the open source community and all the enthusiasts who work on these challenges. I’ve written the scripts for configuring the NVIDIA drivers on Fedora here, and the script to install NVIDIA docker, followed by a GPU-integrated pytorch container here. Building a homemade ML environment using docker is something I’ll leave to an upcoming post, but I will show you how to setup Fedora with your NVIDIA drivers, install nvidia-docker, and build a GPU-integrated tensorflow image. Any other hacky system updates or dependency fine-tuning that you need to do can go into a customized docker container as well. Using NVIDIA docker you can access your system’s GPU without having to do a system install of CUDA. The beauty of docker for this specific use case is that once you’ve configured your NVIDIA drivers you never have to touch your system configurations again. My moment of truth came when, after two previous installation attempts, I couldn’t compile CUDA’s test suite because Fedora’s gcc version was ahead of the version required by the most recent CUDA release at the time. When you’re building at the level system you’re trying to hit a bull’s eye in a shifting 3-dimensional target space. So can save your setup code, but it probably won’t work in 6 months. Updates to linux distros, CUDA, and leading toolkits aren’t synced. Without belaboring the point, a lot of what I’ve been struggling with (for a couple years now!) while doing this work is that system builds of these environments causes all kinds of headaches.
Enter Dockerĭocker is where it’s at for applications that have elaborate system configurations, like setting up a CUDA-integrated deep learning environment. I decided to get as far as I could with the NVIDIA/CUDA setup on Fedora 29. Eight months since the CentOS 7 setup broke down, I had a desktop again. I gave the Fedora install a try, and it went off flawlessly.
I still had one of the big takeaways from last year’s struggle to get the system top-of-mind, however: Software is cheap, hardware is expensive. Most of my team favored removing the GPU, running off of the motherboard only, and fixing whatever was wrong with the nouveau drivers. On one of my last days in the office, as I explained in exasperation my never ending struggle to get an NVIDIA workstation running, Ivan Californias casually suggested I try Fedora and handed me bootable USB.
It wasn’t until I was preparing to leave Credijusto and start a new project with an (awesome ??) NYC-based team of friends from college that getting the home work station running again became a priority. Out of a combination of frustration and busyness at work I just gave up for awhile. Something about how the GPU interacted with the OS changed, and I didn’t have the courage to open the box back up, take out the GPU, and debug by running the computer off of the motherboard’s integrated graphics.¹
While I was able to complete the CentOS 7 reinstallation, I never saw the login screen again. I wasn’t worried: I hadn’t put a ton of time into configuring the software on the machine, and planned to just reinstall based on my guide. After multiple restarts, and doing everything I could to edit the nouveau settings from the GRUB, I was still unable to login. This worked great for awhile, until one day I rebooted the machine and the console was threw some errors related to the nouveau drivers. Last year I wrote about my long march toward finally getting a home deep learning station with a CUDA-integrated NVIDIA 1080 GPU up and running on CentOS 7.