VMware Private AI Foundation with NVIDIA - 生成式人工智慧解決方案

sysin發表於2024-10-17

VMware Private AI Foundation with NVIDIA - 生成式人工智慧解決方案

透過 NVIDIA 的加速計算以及 VMware Cloud Foundation 的虛擬基礎架構管理和雲管理來執行生成式 AI 工作負載

請訪問原文連結:https://sysin.org/blog/vmware-private-ai-foundation-nvidia/ 檢視最新版。原創作品,轉載請保留出處。

作者主頁:sysin.org


VMware Private AI Foundation with NVIDIA

透過聯合生成式 AI 平臺解鎖生成式人工智慧並釋放生產力。解決隱私、選擇、成本、效能和合規性問題。

Architecture

閱讀部落格

  • 產品概述
  • 資源

解鎖新一代人工智慧並釋放生產力

  • 閱讀解決方案簡介

  • 閱讀 IDC 白皮書

    icon-privacy1.png

  • 實現隱私、安全和合規性

    使用人工智慧服務的架構方法來實現企業資料的隱私、安全和控制。

    icon-agility2.png

  • 獲得加速效能

    藉助 VMware Cloud Foundation 和 NVIDIA AI Enterprise 中的整合軟體和硬體功能,從生成式 AI 模型中獲取最佳效能。

    icon-automation.png

  • 簡化生成式 AI 部署並最佳化成本

    利用向量資料庫、深度學習虛擬機器等特殊功能,獲得簡化的部署體驗和顯著的成本效率。

構建和部署私有且安全的生成式 AI 模型

vmw-icon-data-center-extension.svg

  • 引導式部署

    透過工作負載域和相關元件的引導式部署 (sysin),顯著提高部署速度

    icon-datacenter.png

  • 用於啟用 RAG 工作流程的向量資料庫

    透過 PostgreSQL 上的 pgvector 支援的向量資料庫,實現資料快速查詢和實時更新,以增強 LLMs 的輸出。

    icon-app-volumes.png

  • 目錄設定嚮導

    透過精心策劃和最佳化的 AI 基礎設施目錄項,簡化複雜專案的基礎設施配置。

    icon-virtualization.png

  • GPU 監控

    透過跨叢集和主機檢視 GPU 資源利用率來簡化 GPU 使用,從而獲得最佳化的效能和成本。

    icon-solution-developer.png

  • 深度學習虛擬機器模板

    使用預配置的深度學習虛擬機器提高環境的一致性。

    icon-microservices.png

  • NVIDIA Nemo Retriever

    透過一系列 NVIDIA CUDA-X生成式 AI 微服務增強 RAG 功能 (sysin),使組織能夠將自定義模型無縫連線到不同的業務資料。

    icon-lightweight-v2.png

  • NVIDIA NIM Operator

    使用 NVIDIA AI 工作流程示例簡化 RAG 應用程式部署到生產中,無需重寫程式碼。

    icon-container4.png

  • NVIDIA NIM

    透過一組易於使用的微服務實現大規模無縫 AI 推理,這些微服務旨在加速生成式 AI 在企業中的部署。

icon-usage-meter.png

  • NVIDIA GPU Operator

    自動管理將 GPU 與 Kubernetes 結合使用所需的軟體的生命週期。提高 GPU 效能、利用率和遙測。

系統架構

System Architecture of VMware Private AI Foundation with NVIDIA

VMware Private AI Foundation with NVIDIA runs on top of VMware Cloud Foundation adding support for AI workloads in VI workload domains with vSphere IaaS control plane provisioned by using kubectl and VMware Aria Automation .

Example Architecture for VMware Private AI Foundation with NVIDIA

sysin

Component Description
GPU-enabled ESXi hosts ESXi hosts that configured in the following way: Have an NVIDIA GPU that is supported for VMware Private AI Foundation with NVIDIA. The GPU is shared between workloads by using the time slicing or Multi-Instance GPU (MIG) mechanism. Have the NVIDIA vGPU host manager driver installed so that you can use vGPU profiles based on MIG or time slicing.
Supervisor One or more vSphere clusters enabled for vSphere IaaS control plane so that you can run virtual machines and containers on vSphere by using the Kubernetes API. A Supervisor is a Kubernetes cluster itself, serving as the control plane to manage workload clusters and virtual machines.
Harbor registry A local image registry in a disconnected environment where you host the container images downloaded from the NVIDIA NGC catalog.
NSX Edge cluster A cluster of NSX Edge nodes that provides 2-tier north-south routing for the Supervisor and the workloads it runs.The Tier-0 gateway on the NSX Edge cluster is in active-active mode.
NVIDIA Operators NVIDIA GPU Operator. Automates the management of all NVIDIA software components needed to provision GPU to containers in a Kubernetes cluster. NVIDIA GPU Operator is deployed on a TKG cluster. NVIDIA Network Operator. NVIDIA Network Operator also helps configuring the right mellanox drivers for containers using virtual functions for high speed networking, RDMA and GPUDirect.Network Operator works together with the GPU Operator to enable GPUDirect RDMA on compatible systems. NVIDIA Network Operator is deployed on a TKG cluster.
Vector database A PostgreSQL database that has the pgvector extension enabled so that you can use it in Retrieval Augmented Generation (RAG) AI workloads.
NVIDIA Licensing Portal NVIDIA Delegated License Service (DLS) You use the NVIDIA Licensing Portal to generate a client configuration token to assign a license to the guest vGPU driver in the deep learning virtual machine and the GPU Operators on TKG clusters. In a disconnected environment or to have your workloads getting license information without using an Internet connection, you host the NVIDIA licenses locally on a Delegated License Service (DLS) appliance.
Content library Content libraries store the images for the deep learning virtual machines and for the Tanzu Kubernetes releases. You use these images for AI workload deployment within the VMware Private AI Foundation with NVIDIA environment. In a connected environment, content libraries pull their content from VMware managed public content libraries. In a disconnected environment, you must upload the required images manually or pull them from an internal content library mirror server.
NVIDIA GPU Cloud (NGC) catalog A portal for GPU-optimized containers for AI, and machine learning that are tested and ready to run on supported NVIDIA GPUs on premises on top of VMware Private AI Foundation with NVIDIA.

As a cloud administrator (sysin), you use the management components in VMware Cloud Foundation

Management Component Description
SDDC Manager You use SDDC Manager for the following tasks: Deploy a GPU-enabled VI workload domain that is based vSphere Lifecycle Manager images and add clusters to it. Deploy an NSX Edge cluster in VI workload domains for use by Supervisor instances and in the management domain for the VMware Aria Suite components of VMware Private AI Foundation with NVIDIA. Deploy a VMware Aria Suite Lifecycle instance which is integrated with the SDDC Manager repository.
VI Workload Domain vCenter Server You use this vCenter Server instance to enable and configure a Supervisor.
VI Workload Domain NSX Manager SDDC Manager uses this NSX Manager to deploy and update NSX Edge clusters.
VMware Aria Suite Lifecycle You use VMware Aria Suite Lifecycle to deploy and update VMware Aria Automation and VMware Aria Operations.
VMware Aria Automation You use VMware Aria Automation to add self-service catalog items for deploying AI workloads for DevOps engineers and data scientists.
VMware Aria Operations You use VMware Aria Operations for monitoring the GPU consumption in the GPU-enabled workload domains.
VMware Data Services Manager You use VMware Data Services Manager to create vector databases, such as a PostgreSQL database with pgvector extension.

VMware 相關元件

VMware Components in VMware Private AI Foundation with NVIDIA

VMware Cloud Foundation 5.2

The functionality of the VMware Private AI Foundation with NVIDIA solution is available across several software components.

  • VMware Cloud Foundation 5.2
  • VMware Aria Automation 8.18
  • VMware Aria Operations 8.18
  • VMware Data Services Manager 2.1

VMware Cloud Foundation 5.1

The functionality of the VMware Private AI Foundation with NVIDIA solution is available across several software components.

  • VMware Cloud Foundation 5.1.1
  • VMware Aria Automation 8.16.2 and VMware Aria Automation 8.17
  • VMware Aria Operations 8.16 and VMware Aria Operations 8.17.1
  • VMware Data Services Manager 2.0.x

準備好開始了嗎?

聯絡 VMware

VMware Private AI Foundation with NVIDIA 支援兩種用例:

  • 開發用例
    雲管理員和 DevOps 工程師可以以深度學習虛擬機器的形式配置 AI 工作負載,包括檢索增強生成 (RAG)。資料科學家可以使用這些深度學習虛擬機器進行人工智慧開發。
  • 生產用例
    雲管理員可以為 DevOps 工程師提供具有 NVIDIA 環境的 VMware Private AI Foundation,以便在 vSphere IaaS 控制平面上的 Tanzu Kubernetes Grid (TKG) 叢集上調配生產就緒的 AI 工作負載。

相關產品:

  • VMware Cloud Foundation 5.2 - 領先的多雲平臺
  • VMware Aria Suite 8.18 釋出 - 雲管理解決方案
  • VMware Data Services Manager 2.1 - 資料庫管理和資料服務管理
  • VMware vSphere 8.0 Update 3b 下載 - 企業級工作負載平臺
  • VMware Tanzu Kubernetes Grid (TKG) 2.5.2 - 企業級 Kubernetes 解決方案

更多:VMware 產品下載彙總

相關文章