For years I’ve had a dream of building a rack mounted PC capable of splitting its resources to host multiple GPU intensive VMs:

  • a few gaming VMs
  • a VM for work that can run Davinci Resolve and Blender renders
  • an LLM server
  • a Stable Diffusion server
  • media server

Just to name a few possibilities…

Everytime I’ve looked into it, it seemed like the technology just wasn’t there yet. I remember a few years ago Linus TT took a shot at it, but in the end suggested the technology (for non-commercial entities) just wasn’t in a comfortable spot yet.

So how far off are we? Obviously AI focused companies seem to make it work, but what possibilities exist for us self-hosters who might also want to run multiple displays in addition to the web gui LLM servers? And without forking out crazy money for GPU virtualization software licenses?

  • jet@hackertalks.com
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    5 months ago

    Yes, for some definition of ‘low latency’.

    Geforce now, shadow.tech, luna, all demonstrate this is done at scale every day.

    Do your own VM hosting in your own datacenter and you can knock off 10-30ms of latency.

    However you define low latency there is a way to iteratively approach it with different costs. As technology marches on, more and more use cases are going to be ‘good enough’ for virtualization.

    Quite frankly, if you have a all optical network being 1m away or 30km away doesn’t matter.

    Just so we are clear, local isn’t always the clear winner, there are limits on how much power, cooling, noise, storage, and size that people find acceptable for their work environment. So there is some tradeoff function every application takes into account of all local vs distributed.

    • yggstyle@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      5 months ago

      This. Exactly. Many solutions exist but need to be selected based on scale and personal needs.