High-end medical equipment, such as MRI and ultrasound imaging devices, has enormous compute demands to produce images that help physicians in diagnosis and treatment. Today’s high-power GPUs can be used in add-on appliances to boost the power of such equipment.


Medical applications like computed tomography (CT) scanning and magnetic resonance imaging (MRI) require quick, accurate results from processing complex algorithms. So reducing the compute time required is a primary challenge to manufacturers of CT and MRI equipment. Other significant challenges include the cost of the computers required to achieve the necessary performance and the space those computers occupy. Three recent technical advances have significantly helped to overcome these issues: the adoption of PCI Express (PCIe) over cable, the emergence of compute acceleration cards (GPUs) and PCIe Flash storage cards.

PCIe first emerged as the bus of choice in 2005. One of its many advantages is that the PCIe bus can be transmitted over a cable to another device.  This sets the stage for the introduction of many expansion appliances that connect directly to a host computer through the PCIe bus. The expansion enclosure is one such device, which provides external slots that the server can access as if they are in the server itself. Expansion enclosures support a multitude of commercially available PCIe add-in boards. Because these boards are operating on the same PCIe bus as the motherboard, no software conversion is required from the server to the device, thus reducing latency and making PCIe more attractive than Infiniband or other high speed cabled buses.

About the same time that the PCIe bus emerged, GPU cards began to be used for general compute acceleration. Multi-core GPU processors significantly offload the CPU, delivering results more quickly and reducing the workload on the CPU. Results that took hours to compute using traditional CPUs are now delivered in record time. Medical imaging is one of the earliest applications to take advantage of GPU computing to achieve acceleration. The use of GPUs in this field has matured to the point that there are a number of medical devices shipping with multiple AMD or NVIDIA GPUs. Other medical devices employing multiple GPUs in this way are microsurgery robots, wedge prism endoscopes, and surgical stereoscopic composite displays.

Figure 1: For demanding applications like ultrasound the ability to add additional processing power can dramatically increase performance.

Multiple GPUs can be installed in some computers but most computers do not provide enough power or cooling to accommodate multiple GPUs. PCIe expansion appliances with up to sixteen GPUs have begun to be used in these applications, thus reducing the number of servers required. The fewer servers required, the lower the overall cost and the reduction of space necessary to accommodate them. In addition, GPU appliances can be cabled to more than one server, spreading the workload out more efficiently. For example, for such demanding applications as ultrasound (Figure 1), a 2U GPU appliance with four GPUs can be cabled to one or two servers.  Each connection has a 128Gb/s throughput with additional PCIe switches to allow each GPU to operate at full bandwidths.  Using four NVIDIA Tesla K80 GPUs, the CA4000, a 2U GPU appliance from One Stop Systems delivers 35Tflops of computational power (Figure 2).


Figure 2: The CA4000 GPU appliance adds almost 20,000 cores and 35TFlops of compute power to one or two servers.

A computing system using the NVIDIA Tesla GPUs gives a CT scan system the horsepower it needs to meet the healthcare industry’s pace. A configuration of four Tesla GPU processors is able to run through a scanner’s algorithm in less than 20 minutes. By comparison, a 16-processor computer system takes more than twice the time. In addition, a single server with a GPU appliance with four Tesla GPUs is considerably less expensive than the 16-node cluster.

This significantly reduces the overall equipment cost. By using NVIDIA’s GPU computing technology in Techniscan’s Whole Breast Ultrasound system, radiologists can perform a complete ultrasound scan and see the results within a 30-minute patient visit. This eliminates the delay in test results so patients and doctors have a fast and efficient device that can be relied upon to deliver results at the pace of modern medicine.2

Another application, naked-eye stereoscopy displays 3D stereoscopic images without the need for special eyeglasses. This intriguing technology not only has applications in entertainment, but is a practical technology for a variety of imaging applications. One application is Integral Videography (IV). This method uses a special display comprised of a micro-lens array, consisting of convex lenses on a matrix which is bonded to a liquid crystal panel. Directly beneath each micro-lens, there are some 100 liquid crystal elements and the convex lens projects the light from each element in various directions. The object to be represented in 3D space is illuminated by light rays from several directions, forming a stereoscopic image which to the user seems to be floating in air (Figure 3).


Figure 3: Integral videography

Larger numbers of GPUs can be connected together to accomplish even more compute intensive operations.  The High Density Compute Accelerator (HDCA) from One Stop Systems supports sixteen interconnected GPUs cabled to one to four servers through PCIe producing 139Tflops of computational power using NVIDIA Tesla K80 GPUs. This can be more efficient than employing dozens of servers while reducing rack space to as little as 4U (Figure 4).


Today’s GPUs have a parallel computing architecture that dramatically increases computing performance. This gives artists thousands of cores per GPU and multiple GPUs per workstation. The problem is that only a small percentage of these cores are utilized because slow disk I/O stalls data before it gets to the cores. PCIe NAND Flash boards like Fusion ioFX eliminate this I/O bottleneck so GPUs work at peak performance, greatly accelerating image-processing tasks like encoding and decoding. Furthermore, an ioFX can also accelerate render time in certain cases.4

Rotating disks operate at about 60-120MB/s and an array of 36 disks with RAID can receive and store data at a rate of at about 2GB/s. Solid state drives operate much faster at about 500MB/s.  An even faster storage solution are PCIe Flash cards operating at over 2GB/s. Disk or SSDs are limited by the bandwidth of the bus feeding the data, in this case SATA. PCIe Flash cards are not limited to the SATA bus but operate directly from the PCIe bus. The higher performance of PCIe cards makes them particularly suitable for buffering and caching applications, with content delivery high on the list of suitable applications. Inter-connecting 32 PCIe Flash boards into a single enclosure and then cabling to one or more servers can provide up to 200TB of fast storage easily accessible by multiple servers.

The Fusion ioMemory solution provides a new tier of server memory based on NAND flash technology. Unlike SSDs, which use legacy disk controllers and storage protocols, the ioMemory platform provides direct access to flash memory via the PCIe bus. By eliminating storage controllers and accessing the NAND natively, ioMemory devices are able to run at nearly the speed of DRAM, enabling storage speeds of tens of gigabytes per second within a single server.

Using PCIe expansion, the ratio of Flash cards to server can be greatly increased. A 3U Flash storage appliance like the FSA from One Stop Systems contains up to 200TB of fast. The FSA cables to one to four servers through PCIe 3.0 x16 connections (Figure 4).


Figure 4: The CA16000 GPU appliance with NVIDIA K80 GPUs from One Stop Systems adds over 79,800 cores and 139.8TFlops of compute power to one or four servers

Receiving results quickly and accurately from medical procedures is the primary concern of today’s clinicians. In order to accomplish this many applications are turning to a segment of computing known as high performance computing (HPC) for answers. The latest technologies in HPC are being utilized in medical systems. This is generally known as high performance embedded computing (HPEC). One obvious application is medical imaging where high volumes of data must be processed and delivered quickly. CT scanning, MRI, and ultrasound are examples that incur a number of challenges to meet these requirements.  Among these are reducing time, cost, and space. By using PCIe expansion to add multiple GPUs and Flash storage cards to a system topology, the time it takes for a patient to receive diagnostic results is significantly reduced. Fewer servers are required to achieve the required performance and fewer servers means less overhead, reducing costs and saving valuable space.

HPEC is rapidly being bolstered with new solutions that dramatically reduce time, cost, and space requirements with appliances that utilize the latest technologies of PCIe expansion, GPUs, PCIe Flash storage. Attaching a GPU appliance to one server can out-perform 12 servers, cost less and use less space.  A Flash storage appliance can store and retrieve more data faster than racks of rotating disks. This is by far the most innovative approach meeting the challenges of today’s data-centric world.