Using these new TensorRT-LLM optimizations, NVIDIA has pulled out a huge 2.4x performance leap with its current H100 AI GPU in MLPerf Inference 3.1 to 4.0 with GPT-J tests using an offline scenario.
Nvidia is aiming to dramatically accelerate and optimize the deployment of generative AI large language models (LLMs) with a new approach to delivering models for rapid inference. At Nvidia GTC today, ...
At its GTC conference, Nvidia today announced Nvidia NIM, a new software platform designed to streamline the deployment of custom and pre-trained AI models into production environments. NIM takes the ...
Apple and NVIDIA shared details of a collaboration to improve the performance of LLMs with a new text generation technique for AI. Cupertino writes: Accelerating LLM inference is an important ML ...
Developers can now take advantage of NVIDIA NIM packages to deploy enterprise generative AI, said NVIDIA CEO Jensen Huang. NVIDIA’s newest GPU platform is the Blackwell (Figure A), which companies ...
A crafted inference request in Triton’s Python backend can trigger a cascading attack, giving remote attackers control over AI-serving environments, researchers say. A surprising attack chain in ...
Rated horsepower for a compute engine is an interesting intellectual exercise, but it is where the rubber hits the road that really matters. We finally have the first benchmarks from MLCommons, the ...
Nvidia has released analysis showing a 4X to 10X reduction in cost per token for AI inferencing by switching to open source models. The cost discounts required combining Blackwell hardware with two ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results