For years I’ve had a dream of building a rack mounted PC capable of splitting its resources to host multiple GPU intensive VMs:
- a few gaming VMs
- a VM for work that can run Davinci Resolve and Blender renders
- an LLM server
- a Stable Diffusion server
- media server
Just to name a few possibilities…
Everytime I’ve looked into it, it seemed like the technology just wasn’t there yet. I remember a few years ago Linus TT took a shot at it, but in the end suggested the technology (for non-commercial entities) just wasn’t in a comfortable spot yet.
So how far off are we? Obviously AI focused companies seem to make it work, but what possibilities exist for us self-hosters who might also want to run multiple displays in addition to the web gui LLM servers? And without forking out crazy money for GPU virtualization software licenses?
As others have expressed- were already there. Understand though that the reason this hasn’t caught on mainstream is the entire purpose of what you are asking is simple: it runs counter to the standards of commercial capitalism. We are talking about efficiency, self hosting, doing more with less, and cutting strings.
That said- understand that what you are undertaking is not dissimilar from building infrastructure in a company. You are building and expanding to meet your needs. Your needs are unique so there isn’t a ‘turn key’ solution that will fit perfectly… so you need to try things and see what works.
As far as things you are talking about specifically: you are going to ultimately be dipping your toes into the virtualization world… so xcp-ng and proxmox are good choices. If you can get your hands on older copies and uh… source a key or two: esxi is also very beginner friendly but won’t be able to upgrade thanks to their new pricing model. You seem like you are aware of the YouTube sphere so let me recommend 2GuysTech and the series on different hypervisors.
Once you decide on a hypervisor it’s as ‘simple’ as building a PC to meet your needs. If you have one already I’d start there to get a feel for how much you can pull out of it to determine how much you may need. You can probably split up a single GPU or just pass it through (cost vs performance.). LLMs are power / resource hungry so that may require it’s own GPU.
If power is cheap by you you can look into older server hardware but honestly this can be a messy space to dabble in (noise, heat, power costs.)
From there play with services that fit your needs.
It’s very doable and there are some easier paths to take… certainly- but again the thing about homelabs is it’s very custom. This is why the community (in general) is willing to help. We all have had to forge the same path.
100% ^^^ This.
You could do everything with openstack, and it would be a great learning experience, but expect to dedicate about 30% of your life to running and managing openstack. When it just works, it’s great… when it doesn’t… ohh boy, its like a CRPG which will unlock your hardware after you finish the adventure.
Can this solution deliver 3+ streams of high resolution (1440p or higher and 144fps) low latency video with no artifacting and near native performance and responsiveness?
Gaming has a high requirement for high fidelity and low latency I/O, no one wants to spend all this money on racks and thin clients, the then get laggy windows and scrolling, artifacts, video compression, and low resolution.
That’s the problem at hand with a gaming server, if you want to replace a gaming desktop with a vm in a rack, you need to actually get the I/O to the user somehow, either through dedicated cables from the rack, fiber, or networking, the first is impractical, it involves potentially 100ft long runs of multiple display port, HDMI, USB, etc, and is very rigid in its application, the second is very expensive, shooting the price up to thousands of dollars per seat for display port/USB over fiber extenders, and the third option I have yet to see a vnc/remote solution that can deliver near native video performance.
I should reiterate, the op wants to do fidelity sensitive tasks, like video editing, they don’t just need to work on a spreadsheet.
Yes, for some definition of ‘low latency’.
Geforce now, shadow.tech, luna, all demonstrate this is done at scale every day.
Do your own VM hosting in your own datacenter and you can knock off 10-30ms of latency.
However you define low latency there is a way to iteratively approach it with different costs. As technology marches on, more and more use cases are going to be ‘good enough’ for virtualization.
Quite frankly, if you have a all optical network being 1m away or 30km away doesn’t matter.
Just so we are clear, local isn’t always the clear winner, there are limits on how much power, cooling, noise, storage, and size that people find acceptable for their work environment. So there is some tradeoff function every application takes into account of all local vs distributed.
This. Exactly. Many solutions exist but need to be selected based on scale and personal needs.