Data Center 101: Server Virtualization

Virtualization is a key piece of modern data center design.  Virtualization occurs on many devices within the data center, conceptually virtualization is the ability to create multiple logical devices from one physical device.  We’ve been virtualizing hardware for years:  VLANs and VRFs on the network, Volumes and LUNs on storage, and even our servers were virtualized as far back as the 1970s with LPARs. Server virtualization hit mainstream in the data center when VMware began effectively partitioning clock cycles on x86 hardware allowing virtualization to move from big iron to commodity servers. 

This post is the next segment of my Data Center 101 series and will focus on server virtualization, specifically virtualizing x86/x64 server architectures.  If you’re not familiar with the basics of server hardware take a look at ‘Data Center 101: Server Architecture’ (http://www.definethecloud.net/?p=376) before diving in here.

What is server virtualization:

Server virtualization is the ability to take a single physical server system and carve it up like a pie (mmmm pie) into multiple virtual hardware subsets. 

imageEach Virtual Machine (VM) once created, or carved out, will operate in a similar fashion to an independent physical server.  Typically each VM is provided with a set of virtual hardware which an operating system and set of applications can be installed on as if it were a physical server.

Why virtualize servers:

Virtualization has several benefits when done correctly:

  • Reduction in infrastructure costs, due to less required server hardware.
    • Power
    • Cooling
    • Cabling (dependant upon design)
    • Space
  • Availability and management benefits
    • Many server virtualization platforms provide automated failover for virtual machines.
    • Centralized management and monitoring tools exist for most virtualization platforms.
  • Increased hardware utilization
    • Standalone servers traditionally suffer from utilization rates as low as 10%.  By placing multiple virtual machines with separate workloads on the same physical server much higher utilization rates can be achieved.  This means you’re actually using the hardware your purchased, and are powering/cooling.

How does virtualization work?

Typically within an enterprise data center servers are virtualized using a bare metal installed hypervisor.  This is a virtualization operating system that installs directly on the server without the need for a supporting operating system.  In this model the hypervisor is the operating system and the virtual machine is the application. 

image

Each virtual machine is presented a set of virtual hardware upon which an operating system can be installed.  The fact that the hardware is virtual is transparent to the operating system.  The key components of a physical server that are virtualized are:

  • CPU cycles
  • Memory
  • I/O connectivity
  • Disk

image

At a very basic level memory and disk capacity, I/O bandwidth, and CPU cycles are shared amongst each virtual machine.  This allows multiple virtual servers to utilize a single physical servers capacity while maintaining a traditional OS to application relationship.  The reason this does such a good job of increasing utilization is that your spreading several applications across one set of hardware.  Applications typically peak at different times allowing for a more constant state of utilization.

For example imagine an email server, typically an email server is going to peak at 9am, possibly again after lunch, and once more before quitting time.  The rest of the day it’s greatly underutilized (that’s why marketing email is typically sent late at night.)  Now picture a traditional backup server, these historically run at night when other servers are idle to prevent performance degradation.  In a physical model each of these servers would have been architected for peak capacity to support the max load, but most of the day they would be underutilized.  In a virtual model they can both be run on the same physical server and compliment one another due to varying peak times.

Another example of the uses of virtualization is hardware refresh.  DHCP servers are a great example, they provide an automatic IP addressing system by leasing IP addresses to requesting hosts, these leases are typically held for 30 days.  DHCP is not an intensive workload.  In a physical server environment it wouldn’t be uncommon to have two or more physical DHCP servers for redundancy.  Because of the light workload these servers would be using minimal hardware, for instance:

  • 800Mhz processor
  • 512MB RAM
  • 1x 10/100 Ethernet port
  • 16Gb internal disk

If this physical server were 3-5 years old replacement parts and service contracts would be hard to come by, additionally because of hardware advancements the server may be more expensive to keep then to replace.  When looking for a refresh for this server, the same hardware would not be available today, a typical minimal server today would be:

  • 1+ Ghz Dual or Quad core processor
  • 1GB or more of RAM
  • 2x onboard 1GE ports
  • 136GB internal disk

The application requirements haven’t changed but hardware has moved on.  Therefore refreshing the same DHCP server with new hardware results in even greater underutilization than before.  Virtualization solves this by placing the same DHCP server on a virtualized host and tuning the hardware to the application requirements while sharing the resources with other applications.

Summary:

Server virtualization has a great deal of benefits in the data center and as such companies are adopting more and more virtualization every day.  The overall reduction in overhead costs such as power, cooling, and space coupled with the increased hardware utilization make virtualization a no-brainer for most workloads.  Depending on the virtualization platform that’s chosen there are additional benefits of increased uptime, distributed resource utilization, increased manageability.

GD Star Rating
loading...

Data Center 101: Local Area Network Switching

Interestingly enough 2 years ago I couldn’t even begin to post an intelligent blog on Local Area Networking 101, funny how things change.  That being said I make no guarantees that this post will be intelligent in any way.  Without further ado let’s get into the second part of the Data Center 101 series and discuss the LAN.

I find the best way to understand a technology is to have a grasp on its history and the problems it solves, so let’s take a minute to dive into the history of the LAN.  For the sake of simplicity and real world applicability I’m going to stick to Ethernet as it is the predominant LAN technology in today’s data center environments.  Before we even go into the history we’ll define Ethernet and where it fits on the OSI model.

Ethernet:

Ethernet is a frame based networking technology which is comprised as a set of standards for Layer 1 and 2 of the OSI model.  Ethernet devices use a address called a Media-Access Control Address (MAC) for communication.  MAC addresses are a flat address space which is not routable (can only be used on a flat layer 2 network) and is composed of several components most importantly a vendor ID known as an Organizational Unique Identifier (OUI) and a unique address for the individual port.

OSI Model:

The Open-Systems Interconnection (OSI) model is a sub-division of the components of communication that is used as a tool to create interoperable network systems and is a fantastic model for learning networks.  The OSI model breaks into 7-Layers much like my favorite taco dip.

image Understanding the OSI model and where protocols and hardware fit into it will not only help you learn but also help with understanding new technologies and how they fit together.   I often revert back to placing concepts in terms of the OSI model when having highly technical discussions about new concepts and technology.  The beauty of the model is that it allows for easy interoperability and flexibility.  For instance Ethernet is still Ethernet whether you use Fiber cables or copper cables because only Layer 1 is changing.

Ethernet LAN History:

As the LAN networks we use today evolved they typically started with individual groups within an organization.  For instance a particular group would have a requirement for a database server and would purchase a device to connect that group.  Those devices were commonly a hub.

Hub:

A network hub is a device with multiple ports used to connect several devices for the purposes of network communication.  When an Ethernet hub receives a frame it replicates it to all connected ports except the one it received it on in a process called flooding.  All connected devices receive a copy of the frame and will typically only process the frame if the destination MAC address is their own (there are exceptions to this which are beyond the scope of this discussion.)

image

In the diagram above you see a single device sending a frame and that frame being flooded to all other active ports.  This works quite well for small networks consisting of a single hub and low port count, but you can easily see where problems start to arise as the network grows.

image

Once multiple hubs are connected and the network grows each hub will flood every frame, and all devices will receive these frames regardless of whether they are the intended recipient.  This causes major overhead in the network due to the unneeded frames consuming bandwidth.

Bridge:

The next step in the network evolution is called bridging and was designed to alleviate this problem and decrease the overhead of forwarding unneeded frames.  A bridge is a device that makes an intelligent decision on when and where to flood frames based on MAC addresses stored in a table.  These MAC addresses can be static (manually input) or dynamic (learned on the fly.)  Because it is more common we will focus on dynamic.  The original bridges were typically 2 or more ports (low port counts) and could separate MAC addresses using the table for those ports.

image

In the above diagram you see a hub operating normally on the left flooding the frame to all active ports.  When the frame is received by the bridge a MAC address lookup is done on the MAC table and the bridge makes a decision whether or not to flood to the other side of the network.  Because the frame in this example is destined for a MAC address existing on the left side of the network the bridge does not flood the frame.  These addresses will be learned dynamically as devices send frames.  If the destination MAC address had been a device on the right side of the network the bridge would have sent the frame to that side to be flooded by the hub.

Bridges reduced unnecessary network traffic between groups or departments while allowing resource sharing when needed.  The limitation of original bridges came from the low port counts and changing data patterns.  Because the bridges were typically only separating 2-4 networks there was still quite a bit of flooding, especially when more and more resources were shared across groups.

Switches:

Switches are the next evolution of bridges and the operation they perform is still considered bridging.  In very basic terms a switch is a high port-count bridge that is able to make decisions on a port-by-port basis.  A switch maintains a MAC table and only forwards frames to the appropriate port based on the destination MAC.  If the switch has not yet learned the destination MAC it will flood the frame.  Switches and bridges will also flood multi-cast (traffic destined for multiple recipients) and broadcast (traffic destined for all recipients) frames which are beyond the scope of this discussion.

image

In the diagram above I have added several components to clarify switching operations now that we are familiar with basic bridging. Starting in the top left of the diagram you see some of the information that is contained in the header of an Ethernet frame.  In this case it is the source and destination MAC addresses of two of the devices connected to the switch.  Each end-point in the above diagram is labeled with a MAC address starting with AF:AF:AF:AF:AF.  In the top right we see a representation of a MAC table which is stored on the switch and learned dynamically.  The MAC table contains a listing of which MAC addresses are known to be on each port.  Because the MAC table in this example is fully populated we can assume that the switch has previously seen a frame from each device in order to populate the table.  That auto population is the ‘dynamic learning’ and it is done be recording the source MAC address of incoming frames.  Lastly we see that the frame being sent by the device on port 1 is only being forwarded to the device on port 2.  In the event port 2’s MAC address had not yet been learned the switch would be forced to flood the frame to all ports except the one it received it on in order to ensure it was received by the destination device.

So far we’ve learned that bridges improved upon hubs, and switches improved upon basic bridging.  The next kink in the evolution of Ethernet LANs came as our networks grew beyond single switches and we began adding in redundancy.

The three issues that arose can all be grouped as problems with network loops (specifically Layer 2 Ethernet loops.)  These issues are:

Multiple Frame Copies:

When a device receives the same frame more than once due to replication or loop issues it is a multiple frame copy.  This can cause issues for some hardware and software and also consumes additional unnecessary bandwidth.

MAC Address Instability:

When a switch must repeatedly change its MAC table entry for a given device this is considered MAC address instability.

Broadcast Storms:

Broadcast storms are the most serious of the three issues as they can literally bring all traffic to a halt.  If you ask someone who has been doing networking for quite some time how they troubleshoot a broadcast storm you are quite likely to hear ‘Unplug everything and plug things back in one at a time until you find the offending device.’  The reason for this is that in the past the storm itself would soak up all available bandwidth leaving no means to access switching equipment in order to troubleshoot the issue.  Most major vendors now provide protection against this level of problem but storms are still a serious problem that can have a major performance impact on production data.  Broadcast storms are caused when a broadcast, multi-cast or flooded frame is repeatedly forwarded and replicated by one or more switches.

 image In the diagram above we can see a switched loop.  We can also observe several stages of frame forwarding starting with the device 1 in the top left sending a frame to the device 2 in the top right.

  1. Device 1 forwards a frame to device 2.  This one-to-one communication is known as unicast.
  2. The switch on the top left does not yet have device 2 in its MAC table therefore it is forced to flood the frame, meaning replicate the frame to all ports except the one where it was received. 
  3. In stage three we see two separate things occur:
    1. The switch in the top right delivers the frame to the intended device (for simplicities sake we are assuming the switch in the top right already has a MAC table entry for the device.)
    2. The bottom switch having received the frame forwards the frame to the switch in the top right.
  4. The switch in the top right receives the second copy and forwards it based on MAC table delivering the second copy of the same frame to device 2.

 

image

The above example has a little more going on and can become confusing quickly.  For the purposes of this example assume all three switches have blank MAC address tables with no devices known.  Also remember that they are building the MAC table dynamically based on the source MAC address they see in a frame.  To aid in understanding I will fill out the MAC tables at each step.

1. Our first stage is the easy one.  Device 1 forwards a unicast frame to device 2.  Switch A receives this frame on the top port.

image

2. When switch A receives the frame it checks its MAC table for the correct port to forward frames to device 2.  Because its MAC table is currently blank it must flood the frame (replicate it to all ports except the one where it was received.)  As it floods the frame it also records the MAC address and attached port of device 1 because it has seen this MAC as the source in the frame.

image

3. In stage 3 two switches receive the frame and must make decisions. 

  1. Switch C having a blank MAC table must flood the frame.  Because there is only one port other than the one it received it on switch C floods the frame to the only available port, at the same time it records the source MAC address as having been received on its port 1.  
  2. Switch B also receives the frame from switch A, and must make a decision.  Like switch C, switch B has no records in its MAC table and must flood the frame.  It floods the frame down to switch B, and up to device 2.  At the same time switch B records the source MAC in its MAC table.

image

4. In the fourth stage we again have several things happening. 

  1. Switch C has received the same frame for the second time, this time from port 2.  Because it still has not seen the destination device it must flood the frame.  Additionally because this is the exact same frame switch C sees the MAC address of device 1 coming from its right port, port 2, and assumes the device has moved.  This forces switch C to change it’s MAC table. 
  2. At the same time Switch B receives another copy of the frame.  Switch B seeing the same source address must change its MAC table and because it still does not have the destination MAC in the table it must flood the frame again.

image

In the above diagram pay close attention to the fact that the MAC tables have been changed for switch B and C.  Because they saw the same frame come from a different port they must assume the device has moved and change the table.  Additionally because the cycle has not been completed the loop will continue and this is one way broadcast storms begin.  More and more of these endless loops hit the network until there is no bandwidth left to serve data frames. 

In this simple example it may seem that the easy solution is to not build loops like the triangle in my diagram.  This is actually the premise of the next Ethernet evolution we’ll discuss, but first let’s look at how easy it is to create loops just by adding redundancy.

image

In the diagram above we start with a non-redundant switch link.  This link is a single point of failure and in the event a component fails devices on separate switches will be unable to communicate.  The simple solution is adding a second port for redundancy, with the assumed added benefit of having more bandwidth.  In reality without another mechanism in place adding the second link turns the physical view on the bottom left into the logical view on the bottom right which is loop.  This is where the next evolution comes into play.

Spanning-Tree Protocol (STP):

STP is defined in IEEE 802.1d and provides an automated method for building loop free topologies based on a very simple algorithm.  The premise is to allow the switches to automatically configure a loop free topology by placing redundant links in a blocked state.  Like a tree this loop free topology is built up from the root (root bridge) and branches out (switches) to to the leaves (end-nodes) with only one path to get to each end-node.

imageThe way Spanning-tree does this is by detecting redundant links and placing them in a ‘blocked’ state.  This means that the ports do not send or receive frames. In the event of a primary link failure (designated port) the blocked port is brought online.  The issue with spanning-tree is two fold:

  • Because it blocks ports to prevent loops potential bandwidth is wasted.
  • In failure events Spanning-Tree can take up to 50 seconds to bring the blocked port into an active state, this means there is a potential of 50 seconds of down time for the link.

Multiple versions of STP have been implemented and standardized to improve upon the original 802.1d specification.  These include:

Per-VLAN Spanning-Tree Protocol (PVSTP):

Performs the blocking algorithm independently for each VLAN allowing greater bandwidth utilization.

Rapid Spanning-Tree Protocol (RSTP):

Uses additional port-types not in the original STP specification to allow faster convergence during failure events.

Per-VLAN Rapid Spanning-Tree (PVRSTP):

Provides rapid spanning-tree functionality on a per VLAN basis.

Other STP implementations exist and the details of STP operation in each of its flavors is beyond the scope of what I intend to cover with the 101 series.  If there is a demand these concepts may be covered in a more in-depth 202 series once this series is completed.

Summary:

Ethernet networking has evolved quite a bit over the years and is still a work in progress.  Understanding the how’s and why’s of where we are today will help in understanding the advancements that continue to come.  If you have any comments, questions, or corrections please leave them in the comments or contact me in any of the ways listed on the about page.

GD Star Rating
loading...

Data Center 101: Server Systems

As the industry moves deeper and deeper into virtualization, automation, and cloud architectures it forces us as engineers to break free of our traditional silos.  For years many of us were able to do quite well being experts in one discipline with little to no knowledge in another.  Cloud computing, virtualization and other current technological and business initiatives are forcing us to branch out beyond out traditional knowledge set and understand more of the data center architecture as a whole.

It was this concept that gave me the idea to start a new series on the blog covering the foundation topics of each of the key areas of data center.  This will be lessons designed from the ground up to give you a familiarity with a new subject or refresh on an old one.  Depending on your background, some, none, or all of these may be useful to you.  As we get further through the series I will be looking for experts to post on subjects I’m not as familiar with, WAN and Security are two that come to mind.  If you’re interested in writing a beginners lesson in one of those topics, or any other please comment or contact me directly.

Server Systems:

As I’ve said before in previous posts the application is truly the heart of the data center.  Applications themselves are the reason we build servers, networks, and storage systems.  Applications are the email systems, databases, web content, etc that run our businesses.  Applications run within the confines of an operating system which interfaces directly with server hardware and firmware (discussed later) and provides a platform to run the application.  Operating systems come in many types, commonly Unix, Linux, and Windows with other variants used for specialized purposes such as mainframe and super computers.

Because the server itself sits more closely than any other hardware to the application understanding the server hardware and functionality is key.  Server hardware breaks down into several major components and concepts.  For this discussion we will stick with the more common AMD/Intel architectures known as the x86 architecture.

    System board (Mother Board) All components of a server connect via the system board.  The system board itself is a circuit board with specialized connectors for the server subcomponents.  The system board provides connectivity between each component of the server.
    Central Processing Unit (CPU) The CPU is the workhorse of the server system.  The CPU is performing the calculations that allow the operating system and application to run.  Whatever work is being done by an application is being processed by the CPU.  A CPU is placed in a socket on a system board.  Each socket can hold one CPU.
    Random Access Memory (RAM) Random Access memory is the place where data that is being used by the operating system and application but not currently being processed is stored.  For instance when you hear the term ‘load’ it typically refers to moving data from permanent storage or disk into memory where it can be accessed faster.  Memory is electronic and can be accessed very quickly, but it also requires active power to maintain data which is why it is known as being volatile.
    Disk Disk is a permanent storage media traditionally comprised of magnetic platters known as disks.  Other types of disks exist including Flash disks which provide much greater performance at a higher cost.  The key to disk storage is that it is non-volatile and does not require power to maintain data.

    Disk can either be internal to the server or external in a separate device.  Commonly server disk is consolidated in central storage arrays attached by a specialized network or network protocol.  Storage and storage networks will be discussed later in this series.

    Input/Output (I/O) Input/Output comprises the methods of getting data in and out of the server.  I/O comes in many shapes and sizes but two primary methods used in today’s data centers are Local Area Networks (LAN) using Ethernet as an underlying protocol, and Storage Area Networks (SAN) using Fibre Channel as the underlying protocol (both networks will be discussed later in this series.)  These networks attach to the server using I/O ports typically found on expansion cards.
    System bus The System bus is the series of paths that connect the CPU to the memory.  This will be specific to the CPU vendor. 
    I/O bus The I/O bus is the path that connects the expansion cards (I/O cards) to the CPU and memory.  Several standards exist for these connections allowing multiple vendors to interoperate without issue.  The most common bus type for modern servers is the PCI express or PCIe standard which supports greater bandwidth than previous bus types allowing for higher bandwidth networks to be used.
    Firmware Firmware is low-level software that is commonly hard-coded onto hardware chips.  Firmware runs the hardware device at a low level and interfaces with the BIOS.  In most modern server components the firmware can be updated through a process called ‘flashing.’
    Basic I/O System (BIOS) BIOS is a type of firmware stored in a chip on the system board.  The BIOS is the first code loaded when a server boots and is primarily responsible for initializing hardware and loading an operating system.

Server

image

The diagram above shows a two socket server.  Starting at the bottom you can see the disks, in this case internal Hard Disk Drives (HDD.)  Moving up you can see two sets of memory and CPU followed by the I/O cards and power supplies.  The power supplies convert A/C current to appropriate D/C current levels for use in the system.  Additionally not shown would be fans to move air through the system for cooling.

The bus systems, which are not shown, would be a series of traces and chips on the system board allowing separate components to communicate.

A Quick Note About Processors:

Processors come in many shapes, sizes, and were traditionally rated by speed measures in hertz.  Over the last few years a new concept has been added to processors, and that is ‘cores.’  Simply put a core is a CPU placed on a chip beside other cores which each share certain components such as cache and memory controller (both outside the scope of this discussion.)  If a processor has 2 cores it will operate as if it was 2 physically independent identical processors and provide the advantages of such.

Another technology has been around for quite some time called hyper threading.  A processor can traditionally only process one calculation per cycle (measured in hertz) this is known as a thread.  Many of these processes only use a small portion of the processor itself leaving other portions idle.  Hyper threading allows a processor to schedule 2 processes in the same cycle as long as they don’t require overlapping portions of the processor.  For applications that are able to utilize multiple threads hyper threading will provide an average of approximately 30% increases whereas a second core would double performance.

Hyper threading and multiple cores can be used together as they are not mutually exclusive.  For instance in the diagram above if both installed processors were 4 core processors, that would provide 8 total cores, with hyper threading enabled it would provide a total of 16 logical cores.

Not all applications and operating systems can take advantage of multiple processors and cores, therefore it is not always advantageous to have more cores or processors.  Proper application sizing and tuning is required to properly match the number of cores to the task at hand.

image

Server Startup:

When a server is first powered on the BIOS is loaded from EEPROM (Electronically Erasable Programmable Read-Only Memory) located on the system board.  While the BIOS is in control it performs a series of Power On Self Tests (POST) ensuring the basic operability of the main system components.  From there it detects and initializes key components such as keyboard, video, mouse, etc.  Last the BIOS searches for a bootable device.  The BIOS searches through available bootable media for a device containing a bootable and valid Master Boot Record (MBR.)  It then loads this and allows that code to take over with the load of the operating system.

The order and devices the BIOS searches is configurable in the BIOS settings.  Typical boot devices are:

  • CD/DVD-ROM
  • USB
  • Internal Disk
  • Internal Flash
  • iSCSI SAN
  • Fibre Channel SAN

Boot order is very important when there is more than one available boot device, for instance when booting to a CD-ROM to perform recovery of an operating system that is installed.  It is also important to note that both iSCSI and Fibre Channel network connected disks are handled by the operating system as if they were internal Small Computer System Interface (SCSI) disks.  This becomes very important when configuring non-local boot devices.  SCSI as a whole will be covered during this series.

Operating System:

Once the BIOS is done getting things ready and has transferred control to the bootable data in the MBR that bootable data takes over.  That is called the operating system (OS.)  The OS is the interface between the user/administrator and the server hardware.  The OS provides a common platform for various applications to run on and handles the interface between those applications and the hardware.  In order to properly interface with hardware components the OS requires drivers for that hardware.  Essentially the drivers are an OS level set of software that allow any application running in the OS to properly interface with the firmware running on the hardware.

Applications:

Applications come in many different forms to provide a wide variety of services.  Applications are the core of the data center and are typically the most difficult piece to understand.  Each application whether commercially available or custom built has unique requirements.  Different applications have different considerations for processor, memory, disk, and I/O.  These considerations become very important when looking at new architectures because any change in the data center can have significant effect on application performance.

Summary:

The server architecture goes from the I/O inputs through the server hardware to the application stack.  Proper understanding of this architecture is vital to application performance and applications are the purpose of the data center.  Servers consist of a set of major components, CPU’s to process data, RAM to store data for fast access, I/O devices to get data in and out, and disk to store data in a permanent fashion.  This system is put together for the purpose of serving an application.

This post is the first in a series intended to build the foundation of data center.  If your starting from scratch they may all be useful, if your familiar in one or two aspects then pick and choose.  If this series becomes popular I may do a 202 series as a follow on.  If I missed something here, or made a mistake please comment.  Also if you’re a subject matter expert in a data center area that would like to contribute a foundation blog in this series please comment or contact me.

GD Star Rating
loading...