• Breaking Boundaries: The Quantum Revolution Accelerates with Google’s 105-Qubit Willow Chip and Beyond

    In the rapidly evolving field of quantum computing, recent advancements have propelled the technology closer to practical, real-world applications. A significant milestone was achieved with the unveiling of Google’s 105-qubit Willow chip, which demonstrated unprecedented computational capabilities and advancements in quantum error correction. This breakthrough is part of a broader trend of innovations by leading…

  • Xiaomi’s Bold Leap: Crafting Its Own Mobile Chip to Redefine Tech Independence

    Xiaomi is reportedly developing its own mobile processor, aiming to reduce its dependence on major chip suppliers like Qualcomm and MediaTek. Mass production of this in-house designed chip is expected to commence in 2025. This strategic move aligns with China’s broader initiative to enhance technological self-reliance amid global tech rivalries. By creating its own processors,…

  • Mac mini m4 & m4 pro now available

    Apple has unveiled the latest iterations of its compact desktop lineup: the Mac mini M4 and Mac mini M4 Pro. These new models feature significant enhancements in performance, design, and connectivity, catering to both everyday users and professionals seeking robust computing solutions. Design and Build The redesigned Mac mini now measures a compact 5 x…

  • High-Performance Matrix Multiplication with FLAME/BLIS: A Deep Dive into DGEMM

    When it comes to scientific computing and machine learning, efficient matrix multiplication is a fundamental building block. Among the most critical operations in linear algebra libraries is DGEMM (Double-precision General Matrix Multiply), which computes the product of two double-precision matrices. In the quest for optimal performance, the BLAS (Basic Linear Algebra Subprograms) interface has been…

  • Navigating Memory Management in NUMA Architectures with Dual Memory Technologies

    Modern applications are increasingly complex, requiring not only higher compute power but also sophisticated memory solutions to achieve optimal performance. NUMA (Non-Uniform Memory Access) architectures are designed to tackle memory performance challenges in systems with multiple processors, allowing each processor to access its own local memory faster than it can access memory attached to other…

  • xAI Colossus: Musk’s HPC Powerhouse Set to Transform AI

    xAI Colossus: Musk’s HPC Powerhouse Set to Transform AI Elon Musk’s xAI initiative is powered by a supercomputing cluster called Colossus, designed for high-performance computing (HPC) on an unprecedented scale. The system features over 10,000 Nvidia H100 GPUs, making it one of the most powerful AI-focused clusters in the world. The integration of these GPUs…

  • ARM SVE2 explained

    Scalable Vector Extension 2 (SVE2) is an updated version of the Scalable Vector Extension (SVE), an instruction set introduced by ARM for its processors to improve performance in high-performance computing (HPC), artificial intelligence (AI), and machine learning (ML). SVE2 builds on SVE, offering several improvements aimed at enhancing performance and versatility, especially for general-purpose and…

  • Supercomputers explained

    What is a Supercomputer? What Supercomputers Can Do That Other Systems Cannot: Supercomputers are essential for pushing the boundaries of science, engineering, and technology by tackling problems that require enormous computational resources far beyond what standard computers can handle.

  • Looking for a gem5 specialist?🚀

    Connect with Pol, a dedicated professional with 10 years of hands-on experience in gem5 simulations. Pol is highly skilled in Garnet NoC interconnects, ARM multi-core processor systems, Ruby Cache Coherent systems and full-system simulation. Whether you’re working on intricate architecture designs or need specialized support for your gem5 projects, Pol’s deep expertise can help you…

  • gem5 standard library overview

    The gem5 standard library, introduced in v21.1 and fully released in v21.2, aims to enhance gem5 users’ productivity by providing commonly used components and features. Tutorials are available to help users utilize the library for creating gem5 simulations, including syscall emulation and full-system simulations. The library offers modularity and extensibility. The central component of the…

  • VNET vs VC in gem5 Garnet NoC

    When one is first introduced in Garnet NoC, he might find confusing the terms VNET and VC. A simple explanation follows. VNET (Virtual Network) can be considered a separate physical channel which carries a specific type of messages. More specifically, VNET is directly related with the Cache Coherence protocol that is used by the user.…

  • Empirical roofline tool (ERT) – a benchmark for machine performance characterization

    A well known and very useful benchmark for characterizing a machine performance is the Empirical Roofline Tool (ERT). The Empirical Roofline Tool, ERT, automatically generates a roofline data for a given computer. This includes the maximum bandwidth for the various levels of the memory hierarchy and the maximum gflop rate. This data is obtained using…

  • Benchmark Graviton3E vs Graviton3

    We benchmark the recently released HPC platform: Amazon-Graviton3E. Amazon recently made available the HPC version of Graviton3 named Graviton3E. According to them, the new Hpc7g instances provide up to 35 percent higher vector instruction processing performance in relation to the simple Graviton3. Additionally, Graviton3E provides two times better floating-point performance in comparison to Graviton2. All…

  • HPC news: Tachyum’s prodigy processor targeting 50 exaFLOP supercomputer

    Tachyum‘s forthcoming chip, Prodigy, is poised to power a colossal 50 exaFLOPS supercomputer, with one customer committing to buying hundreds of thousands of these processors. Prodigy is touted to offer 25 times the performance of the world’s fastest conventional supercomputer, including capabilities for AI performance, featuring hundreds of petabytes of DDR5 memory. Tachyum describes Prodigy…

  • ARM SVE Explained

    ARM Scalable Vector Extension (SVE) is an innovative vector processing technology designed by ARM Holdings, primarily for their ARM-based processors. Here’s a concise explanation in 10 sentences: