zjournal
 
   




SPONSORS
This department is sponsored by:


  


 
 

::

Networking With Linux on System z

 

Today, almost every IT application requires the ability to exchange large quantities of information. High-bandwidth links provide the networking infrastructure to enable data transfer for business and spare-time worldwide. Fast Ethernet LANs provide a transfer rate up to 10 gigabits per second (Gbps). To illustrate this speed, transferring the 700MB contents of a CD-ROM would take approximately half a second.

 

Linux on System z, the system of choice for medium- to large-scale business applications, supports all standard network interfaces and protocols and offers several powerful features unique to Linux on a mainframe.

 

PCs usually natively run a single instance of an operating system, controlling the whole computer system. In contrast, IBM System z supports the important concept of virtualization. All operating systems, including Linux, execute in a virtual environment. There may be (and normally are) instances of several different operating systems running simultaneously. To learn more, refer to Linux on the Mainframe (John Eilert et al., Prentice Hall, 2003, ISBN 01310141532) and IBM’s Redbooks (www.redbooks.ibm.com/).

 

The mainframe is divided into Logical Partitions (LPARs). Each LPAR can either directly execute an operating system, such as the Linux kernel (see Figure 1), or a number of operating system images as “guests” of z/VM. VM is the second way to create a virtual environment executing Linux. A network connection between virtualized Linux systems may itself be virtual. In this case, the communicating parties behave as if they were sending and receiving data via a real network, but the data transfer is much faster and actually performed by moving data in the mainframe’s main memory.

 

This article presents a short overview of the layered structures of today’s network architectures as well as the networking features provided by the IBM System z mainframe and how they’re used under Linux. Figure 2 summarizes the networking features mentioned in this article that recent Linux distributions support.

 

Networking Infrastructure Basics

 

The networking infrastructure is described in two major “blueprints” that are the basic framework for many communication protocols:

 

• The Open Systems Interconnection (OSI) reference model defined by the International Standards Organization (ISO)

• The TCP/IP reference model evolved from the rise of the Internet.

 

Figure 3 depicts the layers of these models.

 

The OSI reference model is comprised of seven layers, each responsible for certain aspects of networking functionality. It spans from the Physical Layer (Layer 1) to the Application Layer (Layer 7), which provides network services, such as email and File Transfer Protocol (FTP) applications, to the user. The Data Link Layer (Layer 2) defines the format of the frames sent on the physical media. The predominant format today is Ethernet, which uses the 48-bit Media Access Control (MAC) address to identify the sender and receiver of a frame. This allows IP and non-IP (e.g., IPX, NetBIOS and SNA protocols) traffic. Older machines still support Token Ring.

 

The Network Layer (Layer 3) provides the means to transfer data to its final destination, thereby routing the packet through different networks. The IP protocol is used in the worldwide Internet to perform the Layer 3 functionality. The IP version 4 (IPv4) addresses are 4 bytes in length; its successor, IPv6, addresses are 16 bytes long. To learn more about networking with Linux, see Linux Network Administrator’s Guide (Olaf Kirch and Terry Dawson, O’Reilly Media Inc., ISBN 13: 9781565924000; also available online at http://oreilly.com/catalog/linag2/book/index.html).

 

System z Network Equipment

 

Connecting a computer to a network requires a Network Interface Card (NIC) for exchanging data between the computer’s memory and the network media. This NIC typically implements the OSI-Layers 2 or 3. IBM System z uses the Open System Adapter (OSA) for this purpose. The OSA card is the preferred communications device for the mainframe architecture. This article applies to the OSA-Express2 feature; specific functions of other OSA-Express features are explicitly noted. OSA-Express2 is supported on IBM System z9 and z10; the eServer zSeries machines support the predecessor OSA-Express.

 

The NICs are represented in the Linux kernel as an instance of a networking interface. Besides physical LANs, Linux can easily connect to LAN segments implemented in IBM System z firmware (named HiperSockets)—and also to LAN segments implemented in z/VM (named GuestLAN). Attachment of HiperSockets is via IQDIO and GuestLAN occurs via a virtual NIC; both provide connectivity similar to an OSA card (see Figure 4).

 

Linux on System z networking configuration is described in “Device Drivers, Features, and Commands,” a document available on the IBM developerWorks Linux pages at www.ibm.com/developerworks/linux/linux390/development_documentation.html.

 

The kernel-resident network interfaces can be listed with the ‘ifconfig’ utility. Figure 5 shows a sample ifconfig output where eth0 is an OSA-Express of type OSD_1000, hsi0 is of type HiperSockets, and hsi1 is a GuestLAN (type Hiper).

 

The strategic device driver for OSAExpress is qeth; it names the networking interfaces eth or hsi, which are suffixed by the Linux kernel with an instance number such as eth0. The application uses the socket interface and connects to the target IP address. The network stack keeps track of a routing table that maps destination network addresses to the interface to be used. Figure 6 shows a sample route table.

 

The Linux Device Model

 

The Linux device model is an abstraction of processors, bus controllers, and devices introduced in Linux 2.6. It reflects what drivers are used for specific components and also how the components are connected. A representation of the device model is given via the pseudo filesystem sysfs. Pseudo filesystems are a concept special to UNIXbased operating systems and offer access to devices or operating system features via access to special files that don’t contain data stored on a disk but trigger certain actions. A device is represented as node (subdirectory) in the components tree and externalizing device status information and allowing manipulation of device or driver parameters via attribute files.

 

For example, the directory /sys/bus/ccwgroup/drivers/qethrepresents the qeth device driver controlling the OSA adapters and HiperSockets. In this directory, the qeth device driver shows all its controlled devices, referred by their bus-ids; for example, the directory /sys/bus/ccwgroup/drivers/qeth/0.0.f5f0 represents the OSA adapter with this address. In this directory, you’ll find all attribute files related to this device, such as card_type, online and layer2, among others.

 

NICs were designed for high-speed network access using three subchannels for exchanging data and control data. The basic protocol to handle the data transfer is architected in the Queued Direct Input/Output (QDIO) data transfer architecture.

 

QDIO Mode

 

In the non-QDIO mode (conventional Channel Command Word [CCW] operation), the system assigns an I/O processor, a channel, and a control unit to perform the I/O operation. In contrast, the QDIO mode allows the OSA-Express card to directly read or write data to or from system memory, considerably reducing the TCP/IP path length. The subchannel assigned to the data transfer is organized into queues. Up to four queues are used for output; one queue is used for input. Each queue consists of 64Kb buffers. The output queues have a fixed, pre-defined size of 128 buffers; the input queue has a default of 16 buffers. This offers several advantages:

 

• A 20 percent improvement in performance compared to non-QDIO mode

• Reduction of System Assist Processor (SAP) utilization

• Improved response time.

 

The original design of the basic protocol between the device driver and the QDIO device (pre-OSA-Express2) defined the IP networking layer (Layer 3) as the method for packet exchange.

 

Layer 3 vs. Layer 2 OSA

 

Early OSA models included all OSI layers up to Layer 3. However, the versatility of Linux soon revealed that the Layer 3 method for packet exchange between the QDIO device and the network stack was insufficient. To fully exploit the flexible network functions provided by Linux, later OSA models offer access at Layer 2. In particular, this allows prefixing data packets with an Ethernet header, as required (e.g., by the sniffing application tcpdump), and support of non-IP networking protocols, too. Layer 2 OSA also enables services such as channel bonding (Link Aggregation), which requires the ability to set a MAC address to an NIC.

 

OSA

 

All OSA-Express adapters offer full duplex and direct attachment to the Self-Timed Interface (STI) bus I/O infrastructure in the System z and zSeries. The STI bus enables the Direct Memory Access (DMA) of the OSA adapter. The adapter has a unique factory built-in Ethernet MAC address, which is tagged to packets passed from hosts operating in the Layer 3 interface. The OSA-Express adapter provides many more features:

 

• Priority queuing: The output queues have different priorities assigned. The device driver enqueues IP message traffic according to the priority assigned in the IP header.

• Enhanced IP network availability: Linux registers all home IP addresses from the network stack in the OSAExpress adapter. It’s the OSA adapter responding to Address Resolution Protocol (ARP) requests asking for IP address resolution.

• LPAR-to-LPAR communication: Access to an OSA Express2 port can be shared among the system images to which the channel path is defined to be shared. When a port is shared, OSA forwards the IP traffic immediately to its destination without sending the IP packets out to the LAN.

• OSA for Network Control Program (NCP): This enables the Communications Controller for Linux (CCL) to emulate the NCP functionality, eliminating the need to run an Enterprise Systems Connection (ESCON)-attached 374x Communications Controller in the z/OS and z/VSE environment. The OSA NCP (OSN) acts as a bridge between the operating system (z/OS and z/VSE) using the Channel Data Link Control (CDLC) protocol and Linux using the QDIO architecture.

• Other IP assist functionality: IP assists relieve the TCP/IP stack from performing compute-intensive functions such as broadcast packet filtering, multi-cast support, and checksum calculation.

• LAN connectivity: OSA-Express2 supports 1000BASE-T Ethernet, Gigabit Ethernet (GbE) and 10GbE. The 1000Base-T Ethernet supports a link rate of 10, 100, or 1,000Mb/second over a copper infrastructure. The Gigabit Ethernet features are based on fiber infrastructure. The legacy infrastructure, Token-Ring and Asynchronous Transfer Mode (ATM) in LAN emulation mode, are supported with OSAExpress up to IBM z990 systems.

• New generation LAN access with OSA-Express3: OSA-Express3 is supported in System z10 Enterprise Class (EC). It provides a hardware data router function that allows packet flow directly from host memory to the LAN without firmware intervention. Together with new generations of hardware equipment, the latency is reduced and full 10GbE line speed is achieved. The OSAExpress3 has double-port density (four ports per feature) with Gigabit Ethernet (Multiport) compared to its predecessor, OSA-Express2.

 

HiperSockets

 

Mainframe HiperSockets technology provides high-speed TCP/IP connectivity in a central processor complex, supported by System z and eServer zSeries. HiperSockets eliminates the need to traverse an external network connection to communicate between LPARs in the same System z. Packets are copied from the sender’s output queue to the receiver’s input queue in a synchronous fashion. The virtual device, named IQDIO, is built on the same principles as QDIO mode, hence the term “internal QDIO.” A complete description is found in the HiperSockets Implementation Guide (www.redbooks.ibm.com/redbooks/pdfs/sg246816.pdf).

 

The IBM System z10 adds Layer 2 support; previous systems operated solely in Layer 3 mode. Specific to HiperSockets is large Maximum Transmission Unit (MTU) size. Depending on the type, HiperSockets supports 8Kb, 16Kb, 32Kb, or 56K MTU size.

 

Guest LAN

 

z/VM offers a virtualized LAN for communication between the z/VM guest machines (z/VM 4.2 and later). This type of LAN is known as a guest LAN. The guest LAN resource is created via the CP command ‘DEFINE LAN name’ and can be either type ‘QDIO’ (asynchronous data delivery) or ‘hiper’ (synchronous data delivery). The guest LAN of type ‘hiper’ inherits the MTU capabilities of the real HiperSockets.

 

To access a guest LAN, a virtual NIC is required, which is created via the CP command ‘DEFINE NIC vdev’ with a type matching the guest LAN, or via a NICDEF statement in the guest’s CP Directory entry. GuestLAN access can be restricted to certain userids.

 

Virtual Switch

 

To overcome the burden of routing when a GuestLAN is connected to an external LAN segment, the z/VM Virtual Switch (VSWITCH) was introduced in z/VM 4.4. The VSWITCH attaches the z/VM guest machines with an OSA card to the System z external LAN, implementing one subnet across the hosts on the LAN and the virtual machines. The Virtual Switch is created via the CP command ‘DEFINE VSWITCH name’; it needs a controlling virtual machine (TCP/IP server). For a complete discussion on GuestLAN and VSWITCH, see “Planning and Implementing VSWITCH for Linux/390 Guests” from z/Journal at www.zjournal.com/index.cfm?section=article&aid=315.

 

Configuration Samples

 

In Figures 7 and 8, the command output was taken from a Red Hat Enterprise Linux (RHEL) installation. In RHEL, you must specify the device driver used for an interface in /etc/modprobe. conf. Novell SUSE (SLES) distribution doesn’t require qeth configuration in /etc/modprobe.conf. Each interface has a configuration file; the configuration file for ‘eth0’ is shown in Figure 8. The configuration file for an interface serves two main points: first to group the device’s subchannel to a qeth device, and second to specify the IP networking attributes. For Novell SUSE hardware device and interface configuration, refer to the resp. config files, such as /etc/sysconfig/hardware/hwcfg-qeth-bus-ccw-0.0.f5f0 and /etc/sysconfig/network/ifcfg-qeth-busccw-0.0.f5f0.

 

Summary

 

The OSA-Express2 features (Gigabit Ethernet, 10 Gigabit Ethernet) provide connectivity to clients and servers using 1Gbps or 10Gbps LANs. They’re System z integrated hardware features installed in an I/O cage, making them integral components of the server I/O subsystem. With the OSA-Express functions delivered since June 1999, you have the connectivity, bandwidth, availability, reliability, and recovery that you’ve come to expect from the mainframe.

 

HiperSockets and GuestLAN provide similar connectivity between LPARs or virtual servers in a central processor complex. VSWITCH in conjunction with a real hardware switch lets your Linux/390 systems participate on an external LAN segment. Linux on System z perfectly exploits the System z networking equipment, using the special IP assists of OSA in Layer 3 mode and supporting use of non-IP protocols in Layer 2 mode.


 
   
 
Untitled Document
ARTICLE INFO
ISSUE:
DEPTS: Linux on Syste

SIMILAR ARTICLES

Migrating to Linux on System z: Lessons Learned From the Province of Quebec’s Award-Winning Project

full story

Planning and Implementing VSWITCH for Linux/390 Guests

full story

Planning and Implementing VSWITCH for Linux/390 Guests

full story

Host-Based Access Control for zSeries FCP Channels

full story

Host-Based Access Control for zSeries FCP Channels

full story



ABOUT THE AUTHORS

Wolfgang Gellerich
email: gellerich@de.ibm.com...

 


Klaus-Dieter Wacker
email: kdwacker@de.ibm.com

 


 

©2010 Thomas Communications, Inc.
Site development by everitt.company.
about us | editorial calendar | advertising | subscribe | contact | privacy policy