Blades are Not the Future

Kevin Houston, Founder of Blades Made Simple and all around server and blade rocket surgeon, posted an excellent thought provoking article titled ‘Why Blade Servers Will Be the Core of Future Data Centers ( http://bladesmadesimple.com/2011/10/why-blade-servers-will-be-the-core-of-future-data-centers/.)  The article is his predictions and thoughts on the way in which the server industry will move.  Kevin walks through several stages of blade server evolution he believes could be coming.

  1. Less I/O expansion, basically less switching infrastructure required in the chassis due to increased bandwidth.
  2. More on-board storage options, possibly utilizing the space reclaimed from I/O modules.
  3. External I/O expansion options such as those offered by Aprius and Xsigo,
  4. Going fully modular at the rack-level,extending the concept of a blade chassis to rack size and add shelves of PCIe, storage and servers.

I jokingly replied to him that he’d invented the ‘rack-mount’ server, as in the blades are not in a blade chassis, but inserted into a rack, access external storage in the same rack and have connections to shared resources (PCIe) in that rack.  The reality is Kevin’s vision is closer to a mainframe than a rack-mount.

Overall while I enjoyed Kevin’s post for the thought experiment I think his vision of the data center future is way off from where we’re headed.  Starting off I don’t think that blades are the solution for every problem now.  I’ve previously summarized my thoughts on that, and some bad Shakespeare prose, in a blog on my friend Thomas Jones site: http://niketown588.com/2010/09/08/to-blade-or-not-to-blade/.  Basically stating that blades aren’t the right tool for every job.

Additionally I don’t see blades as the long-term future of enterprise and above computing.  I look to the way Microsoft, Google, Amazon and Facebook do their computing as the future, cheap commodity rack-mounts in mass.  I see the industry transitioning this way.  Blades (as we use them today) don’t hold water in that model due to cost, complexity, proprietary nature, etc.  Blades are designed to save space and they’re built to be highly available, as we start to build our data centers to scale and our applications with more reliability designing them for cloud platforms, highly available server hardware becomes irrelevant.  No service is lost when one of the thousands of servers handling Bing search fails, a new server is put in its place and joins the pool of available resources.

If blades, or some transformation of them, were the future I don’t see it playing out the same way as Kevin does.  I think Kevin’s end concept is built on a series of shaky assumptions: external I/O appliances, and blade chassis storage.

Let’s start with chassis based storage (i.e. shared storage in the blade chassis.  This is something I’ve never been a fan of as it limits access of the shared disk to a single chassis, meaning 14 blades max… wait, less than 14 blades because it uses blade slots to provide disk.  In very small scale this may make sense because you have an ‘all-in-one’ chassis, but the second you outgrow that (oops my business got bigger) you’re now stuck with small silos of data.

The advantage of this approach however is the low-latency access and the high bandwidth availability across the blade back/mid-plane.  This makes this a more interesting option now with lightning fast SSDs and cache options.  Now you can have extremely high performance storage within the blade chassis which provides a lot of options for demanding applications.  In these instances local storage in the chassis will be a big hit, but it will not be the majority of deployments without additional features such as EMC’s ‘Project Lightning’ (http://www.emc.com/about/news/press/2011/20110509-05.htm) to free the trapped data from the confines of the chassis.

Next we have external I/O appliances… These have been on my absolute hate list since the first time I saw them.  Kevin suggests a device based on industry standards but current versions are fully proprietary and require not only the vendors appliance but also the vendors cards in either the appliance or the server, this is the first nightmare.  Beyond that these devices create a single-point-of-failure for the I/O devices of multiple servers, and run directly in the I/O path.  Your basically adding complexity, cost and failure points, and for what?  Let’s look at that:

From Aprius’s perspective ‘Aprius PCI Express over Ethernet technology extends a server’s internal PCIe bus out over the Ethernet network, enabling groups of servers to access and share network-attached pools of PCIe Express devices, such as flash solid state storage, at PCIe performance levels (www.aprius.com.) I’d really like to know how you get ‘PCIe performance levels’ over Ethernet infrastructure???

And from Xsigo: ’In the Xsigo wire-once infrastructure you connect each server to the I/O Director via a single cable. Then connect networks and storage to the I/O Director. You’re now ready to provision Ethernet and Fibre channel connections from servers to data center resource in real time (http://xsigo.com/.)’ Basically you plug all your I/O into their server/appliance then cable it to your server via Infiniband or Ethernet, why???  You’re adding a device in-band in order to consolidate storage and LAN traffic?  FCoE, NFS, iSCSI, etc. already do that on standards based 10GE or 40GE and with no in-band appliance.

Kevin mentions this as a way to allow more space in the blades for future memory and processor options.  This makes sense as HP, IBM, Dell and Sun designs have already run into barriers with the height of their blades restricting processor options.  This is because the blade size was designed years ago and didn’t account for today’s larger processors/heat sinks.  Their only workaround is utilizing two blade slots which consumes too much space per blade.  Newer blade architectures like Cisco UCS take modern processors into account and don’t have this limitation, so don’t require I/O offloading to free space.

Lastly I/O offloading as a whole just stinks to me.  You still have to get the I/O into the server right?  Which means you’ll still have I/O adapters of some type in the server.  With 40GE to the blade shipping this year why would you require anything else?  GPU and cache storage argument?  Sure go that direction and then explain to me why you’d want to pull those types of devices off the local PCIe bus and use them remotely adding latency?

Finally to end my rant a rack size blade enclosure presents a whole lot of lock-in.  You’re at the mercy of the vendor you purchase it from for new hardware and support until it’s fully utilized.  Sounds a lot like the reason we left main frames for x86 in the first place doesn’t it?

Thoughts, corrections comments and sheer hate mail always appreciated!  What do you think?

GD Star Rating
loading...

Technology Passion

The May 24th IDC report on server market share by the IDC validated a technology I’ve been passionate about for some time; Cisco unified Computing System (UCS.)  For the first time since UCS’s launch two years ago Cisco reported server earnings to IDC with amazing result – #3 in global Blade Server market share and 1.6% factory revenue share overall for servers as a whole.  Find the summary of blades by Kevin Houston here: http://bladesmadesimple.com/2011/05/q1-2011-idc-worldwide-server-market-shows-blade-server-leader-as/ and the IDC report here: http://www.idc.com/getdoc.jsp?containerId=prUS22841411

This report shows that in two years Cisco has either taken significant market share from incumbents, driven new demand, or both.  Regardless of where the numbers came from they are impressive, as far as servers go it’s close to David and Goliath proportions and still playing out with Cisco about 1% behind IBM in the #2 spot.  I have been a ‘cheerleader’ for UCS for nearly its entire existence but didn’t start that way.  I describe the transition here: http://www.definethecloud.net/why-cisco-ucs-is-my-a-game-server-architecture

Prior to Cisco UCS I was a passionate IBM BladeCenter advocate, great technology, reliable hardware and a go-to brand.  I was passionate about IBM.  When IBM launched the BladeCenter H they worked hard to ensure customer investment protection and in doing so anchored the H chassis as a whole.  They hindered technical enhancements and created complexity to ensure the majority of components customers purchased in BladeCenter E would be forward compatible.  At the time I liked this concept, and IBM had several great engineering concepts built in that provided real value. 

In the same time frame HP released the C-Class blade chassis which had no forward/backward compatibility with previous HP blade architectures but used that fresh slate to build a world class platform that had the right technology for the time with the scalability to move far into the future.  At that point from a technical perspective I had no choice but to concede HP as the technical victor but I still whole-heartedly recommended IBM because the technical difference was minimal enough that IBM’s customer investment protection model made them the right big picture choice in my eyes.   

I always work with a default preference or what I call an ‘A-Game’ as described in the link above, but my A-Game is constantly evolving.  As I discover a new technology that will work in the spaces I exist I assess it against my A-Game and decide whether it can provide better value to 80% or more of the customer base I work with.  When a technology is capable of displacing my A-Game I replace it.

Sean McGee (http://www.mseanmcgee.com/) says it better than I can, so I’ll paraphrase him ‘I’m a technologist, I work with and promote the best technology I’m aware of and can’t support a product once I know a better one exists.’

In the same fashion I’ll support and promote Cisco UCS until a better competitor proves itself, and I’m happy to see that customers agree based on the IDC reporting.

For some added fun here are some great Twitter comments from before the IDC announcement served with a side of crow:

image

GD Star Rating
loading...

Inter-Fabric Traffic in UCS

It’s been a while since my last post, time sure flies when you’re bouncing all over the place busy as hell.  I’ve been invited to Tech Field Day next week and need to get back in the swing of things so here goes.

In order for Cisco’s Unified Computing System (UCS) to provide the benefits, interoperability and management simplicity it does, the networking infrastructure is handled in a unique fashion.  This post will take a look at that unique setup and point out some considerations to focus on when designing UCS application systems.  Because Fibre Channel traffic is designed to be utilized with separate physical fabrics exactly as UCS does this post will focus on Ethernet traffic only.   This post focuses on End Host mode, for the second art of this post focusing on switch mode use this link: http://www.definethecloud.net/inter-fabric-traffic-in-ucspart-ii.  Let’s start with taking a look at how this is accomplished:

UCS Connectivity

image

In the diagram above we see both UCS rack-mount and blade servers connected to a pair of UCS Fabric Interconnects which handle the switching and management of UCS systems.  The rack-mount servers are shown connected to Nexus 2232s which are nothing more than remote line-cards of the fabric interconnects known as Fabric Extenders.  Fabric Extenders provide a localized connectivity point (10GE/FCoE in this case) without expanding management points by adding a switch.  Not shown in this diagram are the I/O Modules (IOM) in the back of the UCS chassis.  These devices act in the same way as the Nexus 2232 meaning they extend the Fabric Interconnects without adding management or switches.  Next let’s look at a logical diagram of the connectivity within UCS.

UCS Logical Connectivity

imageIn the last diagram we see several important things to note about UCS Ethernet networking:

  • UCS is a Layer 2 system meaning only Ethernet switching is provided within UCS.  This means that any routing (L3 decisions) must occur upstream.
  • All switching occurs at the Fabric Interconnect level.  This means that all frame forwarding decisions are made on the Fabric Interconnect and no intra-chassis switching occurs.
  • The only connectivity between Fabric Interconnects is the cluster links.  Both Interconnects are active from a switching perspective but the management system known as UCS Manger (UCSM) is an Active/Standby clustered application.  This clustering occurs across these links.  These links do not carry data traffic which means that there is no inter-fabric communication within the UCS system and A to B traffic must be handled upstream.

At first glance handling all switching at the Fabric Interconnect level looks as though it would add latency (inter-blade traffic must be forwarded up to the fabric interconnects then back to the blade chassis.)  While this is true, UCS hardware is designed for low latency environments such as High Performance Computing (HPC.)  Because of this design goal all components operate at very low latency.  The Fabric Interconnects themselves operate at approximately 3.2us (micro seconds), and the Fabric Extenders operate at about 1.5us.  This means total roundtrip time blade to blade is approximately 6.2us right inline or lower than most Access Layer solutions.  Equally as important with this design switching between any two blades/servers in the system will occur at the same speed regardless of location (consistent predictable latency.)

The question then becomes how is traffic between fabrics handled?  The answer is that traffic between fabrics must be handled upstream (next hop device(s) shown in the diagrams as the LAN cloud.)  This is an important consideration when designing UCS implementations and selecting a redundancy/load-balancing behavior for server NICs.

Let’s take a look at two examples, first a bare-metal OS (Windows, Linux, etc.) next a VMware server.

Bare-Metal Operating System

image In the diagram above we see two blades which have been configured in an active/passive NIC teaming configuration using separate fabrics (within UCS this is done within the service profile.)  This means that blade 1 is using Fabric A as a primary path with B available for failover and blade 2 is doing the opposite.  In this scenario any traffic sent from blade 1 to blade 2 would have to be handled by the upstream device depicted by the LAN cloud.  This is not necessarily an issue for the occasional frame but will impact performance for servers that communicate frequently.

Recommendation:

For bare-metal operating systems analyze the blade to blade communication requirements and ensure chatty server to server applications are utilizing the same fabric as a primary:

  • When using a card that supports hardware failover provide only one vNIC (made redundant through HW failover) and place its primary path on the same fabric as any other servers that communicate frequently.
  • When using cards that don’t support HW failover use active/passive NIC teaming and ensure that the active side is set to the same fabric for servers that communicate frequently.

VMware Servers

image

In the above diagram we see that the connectivity is the same from a physical perspective but in this case we are using VMware as the operating system.  In this case a vSwitch, vDS, or Cisco Nexus 1000v will be used to connect the VMs within the Hypervisor.  Regardless of VMware switching option the case will be the same.  It is necessary to properly design the the virtual switching environment to ensure that server to server communication is handled in the most efficient way possible.

Recommendation:

  • For half-width blades requiring 10GE or less total throughput, or full-width blades requiring 20GE or less total throughput provide a single vNIC with hardware failover if available or use an active/passive NIC configuration for the VMware switching.
  • For blades requiring the total active/active throughput of available NICs determine application profiles and utilize port-groups (port-profiles with Nexus 1000v) to ensure active paths are the same for application groups which communicate heavily.

Summary:

UCS utilizes a unique switching design in order to provide high bandwidth, low-latency switching with a greatly reduced management architecture compared to competing solutions.  The networking requires a  thorough understanding in order to ensure architectural designs provide the greatest available performance.  Ensuring application groups that utilize high levels of server to server traffic are placed on the same path will provide maximum performance and minimal additional overhead on upstream networking equipment.

GD Star Rating
loading...

Shakespearean Guest Post

I got all Hamlet with my guest post on Thomas Jones blog, check it out to address ‘To blade or not to blade.’

http://www.niketown588.com/2010/09/to-blade-or-not-to-blade.html

GD Star Rating
loading...

Data Center 101: Server Virtualization

Virtualization is a key piece of modern data center design.  Virtualization occurs on many devices within the data center, conceptually virtualization is the ability to create multiple logical devices from one physical device.  We’ve been virtualizing hardware for years:  VLANs and VRFs on the network, Volumes and LUNs on storage, and even our servers were virtualized as far back as the 1970s with LPARs. Server virtualization hit mainstream in the data center when VMware began effectively partitioning clock cycles on x86 hardware allowing virtualization to move from big iron to commodity servers. 

This post is the next segment of my Data Center 101 series and will focus on server virtualization, specifically virtualizing x86/x64 server architectures.  If you’re not familiar with the basics of server hardware take a look at ‘Data Center 101: Server Architecture’ (http://www.definethecloud.net/?p=376) before diving in here.

What is server virtualization:

Server virtualization is the ability to take a single physical server system and carve it up like a pie (mmmm pie) into multiple virtual hardware subsets. 

imageEach Virtual Machine (VM) once created, or carved out, will operate in a similar fashion to an independent physical server.  Typically each VM is provided with a set of virtual hardware which an operating system and set of applications can be installed on as if it were a physical server.

Why virtualize servers:

Virtualization has several benefits when done correctly:

  • Reduction in infrastructure costs, due to less required server hardware.
    • Power
    • Cooling
    • Cabling (dependant upon design)
    • Space
  • Availability and management benefits
    • Many server virtualization platforms provide automated failover for virtual machines.
    • Centralized management and monitoring tools exist for most virtualization platforms.
  • Increased hardware utilization
    • Standalone servers traditionally suffer from utilization rates as low as 10%.  By placing multiple virtual machines with separate workloads on the same physical server much higher utilization rates can be achieved.  This means you’re actually using the hardware your purchased, and are powering/cooling.

How does virtualization work?

Typically within an enterprise data center servers are virtualized using a bare metal installed hypervisor.  This is a virtualization operating system that installs directly on the server without the need for a supporting operating system.  In this model the hypervisor is the operating system and the virtual machine is the application. 

image

Each virtual machine is presented a set of virtual hardware upon which an operating system can be installed.  The fact that the hardware is virtual is transparent to the operating system.  The key components of a physical server that are virtualized are:

  • CPU cycles
  • Memory
  • I/O connectivity
  • Disk

image

At a very basic level memory and disk capacity, I/O bandwidth, and CPU cycles are shared amongst each virtual machine.  This allows multiple virtual servers to utilize a single physical servers capacity while maintaining a traditional OS to application relationship.  The reason this does such a good job of increasing utilization is that your spreading several applications across one set of hardware.  Applications typically peak at different times allowing for a more constant state of utilization.

For example imagine an email server, typically an email server is going to peak at 9am, possibly again after lunch, and once more before quitting time.  The rest of the day it’s greatly underutilized (that’s why marketing email is typically sent late at night.)  Now picture a traditional backup server, these historically run at night when other servers are idle to prevent performance degradation.  In a physical model each of these servers would have been architected for peak capacity to support the max load, but most of the day they would be underutilized.  In a virtual model they can both be run on the same physical server and compliment one another due to varying peak times.

Another example of the uses of virtualization is hardware refresh.  DHCP servers are a great example, they provide an automatic IP addressing system by leasing IP addresses to requesting hosts, these leases are typically held for 30 days.  DHCP is not an intensive workload.  In a physical server environment it wouldn’t be uncommon to have two or more physical DHCP servers for redundancy.  Because of the light workload these servers would be using minimal hardware, for instance:

  • 800Mhz processor
  • 512MB RAM
  • 1x 10/100 Ethernet port
  • 16Gb internal disk

If this physical server were 3-5 years old replacement parts and service contracts would be hard to come by, additionally because of hardware advancements the server may be more expensive to keep then to replace.  When looking for a refresh for this server, the same hardware would not be available today, a typical minimal server today would be:

  • 1+ Ghz Dual or Quad core processor
  • 1GB or more of RAM
  • 2x onboard 1GE ports
  • 136GB internal disk

The application requirements haven’t changed but hardware has moved on.  Therefore refreshing the same DHCP server with new hardware results in even greater underutilization than before.  Virtualization solves this by placing the same DHCP server on a virtualized host and tuning the hardware to the application requirements while sharing the resources with other applications.

Summary:

Server virtualization has a great deal of benefits in the data center and as such companies are adopting more and more virtualization every day.  The overall reduction in overhead costs such as power, cooling, and space coupled with the increased hardware utilization make virtualization a no-brainer for most workloads.  Depending on the virtualization platform that’s chosen there are additional benefits of increased uptime, distributed resource utilization, increased manageability.

GD Star Rating
loading...

Data Center 101: Server Systems

As the industry moves deeper and deeper into virtualization, automation, and cloud architectures it forces us as engineers to break free of our traditional silos.  For years many of us were able to do quite well being experts in one discipline with little to no knowledge in another.  Cloud computing, virtualization and other current technological and business initiatives are forcing us to branch out beyond out traditional knowledge set and understand more of the data center architecture as a whole.

It was this concept that gave me the idea to start a new series on the blog covering the foundation topics of each of the key areas of data center.  This will be lessons designed from the ground up to give you a familiarity with a new subject or refresh on an old one.  Depending on your background, some, none, or all of these may be useful to you.  As we get further through the series I will be looking for experts to post on subjects I’m not as familiar with, WAN and Security are two that come to mind.  If you’re interested in writing a beginners lesson in one of those topics, or any other please comment or contact me directly.

Server Systems:

As I’ve said before in previous posts the application is truly the heart of the data center.  Applications themselves are the reason we build servers, networks, and storage systems.  Applications are the email systems, databases, web content, etc that run our businesses.  Applications run within the confines of an operating system which interfaces directly with server hardware and firmware (discussed later) and provides a platform to run the application.  Operating systems come in many types, commonly Unix, Linux, and Windows with other variants used for specialized purposes such as mainframe and super computers.

Because the server itself sits more closely than any other hardware to the application understanding the server hardware and functionality is key.  Server hardware breaks down into several major components and concepts.  For this discussion we will stick with the more common AMD/Intel architectures known as the x86 architecture.

    System board (Mother Board) All components of a server connect via the system board.  The system board itself is a circuit board with specialized connectors for the server subcomponents.  The system board provides connectivity between each component of the server.
    Central Processing Unit (CPU) The CPU is the workhorse of the server system.  The CPU is performing the calculations that allow the operating system and application to run.  Whatever work is being done by an application is being processed by the CPU.  A CPU is placed in a socket on a system board.  Each socket can hold one CPU.
    Random Access Memory (RAM) Random Access memory is the place where data that is being used by the operating system and application but not currently being processed is stored.  For instance when you hear the term ‘load’ it typically refers to moving data from permanent storage or disk into memory where it can be accessed faster.  Memory is electronic and can be accessed very quickly, but it also requires active power to maintain data which is why it is known as being volatile.
    Disk Disk is a permanent storage media traditionally comprised of magnetic platters known as disks.  Other types of disks exist including Flash disks which provide much greater performance at a higher cost.  The key to disk storage is that it is non-volatile and does not require power to maintain data.

    Disk can either be internal to the server or external in a separate device.  Commonly server disk is consolidated in central storage arrays attached by a specialized network or network protocol.  Storage and storage networks will be discussed later in this series.

    Input/Output (I/O) Input/Output comprises the methods of getting data in and out of the server.  I/O comes in many shapes and sizes but two primary methods used in today’s data centers are Local Area Networks (LAN) using Ethernet as an underlying protocol, and Storage Area Networks (SAN) using Fibre Channel as the underlying protocol (both networks will be discussed later in this series.)  These networks attach to the server using I/O ports typically found on expansion cards.
    System bus The System bus is the series of paths that connect the CPU to the memory.  This will be specific to the CPU vendor. 
    I/O bus The I/O bus is the path that connects the expansion cards (I/O cards) to the CPU and memory.  Several standards exist for these connections allowing multiple vendors to interoperate without issue.  The most common bus type for modern servers is the PCI express or PCIe standard which supports greater bandwidth than previous bus types allowing for higher bandwidth networks to be used.
    Firmware Firmware is low-level software that is commonly hard-coded onto hardware chips.  Firmware runs the hardware device at a low level and interfaces with the BIOS.  In most modern server components the firmware can be updated through a process called ‘flashing.’
    Basic I/O System (BIOS) BIOS is a type of firmware stored in a chip on the system board.  The BIOS is the first code loaded when a server boots and is primarily responsible for initializing hardware and loading an operating system.

Server

image

The diagram above shows a two socket server.  Starting at the bottom you can see the disks, in this case internal Hard Disk Drives (HDD.)  Moving up you can see two sets of memory and CPU followed by the I/O cards and power supplies.  The power supplies convert A/C current to appropriate D/C current levels for use in the system.  Additionally not shown would be fans to move air through the system for cooling.

The bus systems, which are not shown, would be a series of traces and chips on the system board allowing separate components to communicate.

A Quick Note About Processors:

Processors come in many shapes, sizes, and were traditionally rated by speed measures in hertz.  Over the last few years a new concept has been added to processors, and that is ‘cores.’  Simply put a core is a CPU placed on a chip beside other cores which each share certain components such as cache and memory controller (both outside the scope of this discussion.)  If a processor has 2 cores it will operate as if it was 2 physically independent identical processors and provide the advantages of such.

Another technology has been around for quite some time called hyper threading.  A processor can traditionally only process one calculation per cycle (measured in hertz) this is known as a thread.  Many of these processes only use a small portion of the processor itself leaving other portions idle.  Hyper threading allows a processor to schedule 2 processes in the same cycle as long as they don’t require overlapping portions of the processor.  For applications that are able to utilize multiple threads hyper threading will provide an average of approximately 30% increases whereas a second core would double performance.

Hyper threading and multiple cores can be used together as they are not mutually exclusive.  For instance in the diagram above if both installed processors were 4 core processors, that would provide 8 total cores, with hyper threading enabled it would provide a total of 16 logical cores.

Not all applications and operating systems can take advantage of multiple processors and cores, therefore it is not always advantageous to have more cores or processors.  Proper application sizing and tuning is required to properly match the number of cores to the task at hand.

image

Server Startup:

When a server is first powered on the BIOS is loaded from EEPROM (Electronically Erasable Programmable Read-Only Memory) located on the system board.  While the BIOS is in control it performs a series of Power On Self Tests (POST) ensuring the basic operability of the main system components.  From there it detects and initializes key components such as keyboard, video, mouse, etc.  Last the BIOS searches for a bootable device.  The BIOS searches through available bootable media for a device containing a bootable and valid Master Boot Record (MBR.)  It then loads this and allows that code to take over with the load of the operating system.

The order and devices the BIOS searches is configurable in the BIOS settings.  Typical boot devices are:

  • CD/DVD-ROM
  • USB
  • Internal Disk
  • Internal Flash
  • iSCSI SAN
  • Fibre Channel SAN

Boot order is very important when there is more than one available boot device, for instance when booting to a CD-ROM to perform recovery of an operating system that is installed.  It is also important to note that both iSCSI and Fibre Channel network connected disks are handled by the operating system as if they were internal Small Computer System Interface (SCSI) disks.  This becomes very important when configuring non-local boot devices.  SCSI as a whole will be covered during this series.

Operating System:

Once the BIOS is done getting things ready and has transferred control to the bootable data in the MBR that bootable data takes over.  That is called the operating system (OS.)  The OS is the interface between the user/administrator and the server hardware.  The OS provides a common platform for various applications to run on and handles the interface between those applications and the hardware.  In order to properly interface with hardware components the OS requires drivers for that hardware.  Essentially the drivers are an OS level set of software that allow any application running in the OS to properly interface with the firmware running on the hardware.

Applications:

Applications come in many different forms to provide a wide variety of services.  Applications are the core of the data center and are typically the most difficult piece to understand.  Each application whether commercially available or custom built has unique requirements.  Different applications have different considerations for processor, memory, disk, and I/O.  These considerations become very important when looking at new architectures because any change in the data center can have significant effect on application performance.

Summary:

The server architecture goes from the I/O inputs through the server hardware to the application stack.  Proper understanding of this architecture is vital to application performance and applications are the purpose of the data center.  Servers consist of a set of major components, CPU’s to process data, RAM to store data for fast access, I/O devices to get data in and out, and disk to store data in a permanent fashion.  This system is put together for the purpose of serving an application.

This post is the first in a series intended to build the foundation of data center.  If your starting from scratch they may all be useful, if your familiar in one or two aspects then pick and choose.  If this series becomes popular I may do a 202 series as a follow on.  If I missed something here, or made a mistake please comment.  Also if you’re a subject matter expert in a data center area that would like to contribute a foundation blog in this series please comment or contact me.

GD Star Rating
loading...

Why Cisco UCS is my ‘A-Game’ Server Architecture

A-Game:

When I discuss my A-Game it’s my go to hardware vendor for a specific data center component.  For example I have an A-Game platform for:

  • Storage
  • SAN
  • LAN (access Layer LAN specifically, you don’t want me near your aggregation, core or WAN)
  • Servers and Blades (traditionally this has been one vendor for both)

As this post is in regards to my server A-Game I’ll leave the rest undefined for now and may blog about them later.

Over the last 4 years I’ve worked in some capacity or another as an independent customer advisor or consultant with several vendor options to choose from.  This has been either with a VAR or strategic consulting firm such as www.fireflycom.net.)  In both cases there is typically a company lean one way or another but my role has given me the flexibility to choose the right fit for the customer not my company or the vendors which is what I personally strive to do.  I’m not willing to stake my own integrity on what a given company wants to push today.  I’ve written about my thoughts on objectivity in a previous blog (http://www.definethecloud.net/?p=112.)

Another rule in regards to my A-Game is that it’s not a rule, it’s a launching point.  I start with a specific hardware set in mind in order to visualize the customer need and analyze the best way to meet that need.  If I hit a point of contention that negates the use of my A-Game I’ll fluidly adapt my thinking and proposed architecture to one that better fits the customer.  These points of contention may be either technical, political, or business related:

  • Technical: My A-Game doesn’t fit the customers requirement due to some technical factor, support, feature, etc.
  • Political: My A-Game doesn’t fit the customer because they don’t want Vendor X (previous bad experience, hype, understanding, etc.)
  • Business: My A-Game isn’t on an approved vendor list, or something similar.

If I hit one of these roadblocks I’ll shift my vendor strategy for the particular engagement without a second thought.  The exception to this is if one of these roadblocks isn’t actually a roadblock and my A-Game definitely provides the best fit for the customer I’ll work with the customer to analyze actual requirements and attempt to find ways around the roadblock.

Basically my A-Game is a product or product line that I’ve personally tested, worked with and trust above the others that is my starting point for any consultative engagement.

A quick read through my blog page or a jump through my links will show that I work closely with Cisco products and it would be easy to assume that I am therefore inherently skewed towards Cisco.  In reality the opposite is true, over the last few years I’ve had the privilege to select my job(s) and role(s) based on the products I want to work with.

My sorted UCS history:

As anyone who’s worked with me can attest to I’m not one to pull punches, feign friendliness, or accept what you try and sell me based on a flashy slide deck or eloquent rhetoric.  If you’re presenting to me don’t expect me to swallow anything without proof, don’t expect easy questions, and don’t show up if you can’t put the hardware in my hands to cash the checks your slides write.  When I’m presenting to you, I expect and encourage the same.

Prior to my exposure to UCS I worked with both IBM and HP servers and blades.  I am an IBM Certified Blade Expert (although dated at this point.)  IBM was in fact my A-Game server and blade vendor.  This had a lot to do with the technology of the IBM systems as well as the overall product portfolio IBM brought with it.  That being said I’d also be willing to concede that HP blades have moved above IBM’s in technology and innovation, although IBM’s MAX5 is one way IBM is looking to change that.

When I first heard about Cisco’s launch into the server market I thought, and hoped, it was a joke.  I expected some Frankenstein of a product where I’d place server blades in Nexus or Catalyst chassis.  At the time I was working heavily with the Cisco Nexus product line primarily 5000, 2000, and 1000v.  I was very impressed with these products, the innovation involved, and the overall benefit they’d bring to the customer.  All the love in the world for the Nexus line couldn’t overcome my feeling that there was no way Cisco could successfully move into servers.

Early in 2009 my resume was submitted among several others by my company to Learning at Cisco and the business unit in charge of UCS.  This was part of an application process for learning partners in order to be invited to the initial external Train The Trainer (TTT) and participate in training UCS to: Cisco, partners, and customers worldwide.  Myself and two other engineer/trainers (Dave Alexander and Fabricio Grimaldi) were selected from my company to attend.  The first interesting thing about the process was that the three of us were selected above CCIEs, 2x CCIEs and more experienced instructors from our company based on our server backgrounds.  It seemed Cisco really was looking to push servers not some network adaptation.

During the TTT I remained very skeptical.  The product looked interesting but not ‘game-changing.’  The user interfaces were lacking and definitely showed their Alpha and Beta colors.  Hardware didn’t always behave as expected and the real business/technological benefits of the product didn’t shine through.  That being said remember that at this point the product was months away from launch and this was a very Beta version of hardware/software we were working with.  Regardless of the underlying reasons I walked away from the TTT feeling fully underwhelmed.

I spent the time on my flight back to the East Coast from San Jose looking through my notes and thinking about the system and components.  It definitely had some interesting concepts but I didn’t feel it was a platform I would stake my name to at this point.

Over the next couple of months Fabricio Grimaldi and I assisted Dave Alexander (http://theunifiedcomputingblog.com) in developing the UCS Implementation certification course.  Through this process I spent a lot of time digging into the underlying architecture, relating it back to my server admin days and white boarding the concepts and connections in my home office.  Additionally I got more and more time on the equipment to ‘kick-the-tires.’  During this process Dave myself and Fabrico began instructing an internal Cisco course known as UCS Bootcamp.  The course was designed for Cisco engineers from both pre-sales and post-sales roles and focused specifically on the technology as a product deep dive.

It was over these months having discussions on the product, wrapping my head around the technology, and developing training around the components that the lock cylinders in my brain started to click into place and finally the key turned: UCS changes the game for server architecture, the skeptic had become a convert.

UCS the game changer:

The term game changer ge
ts thrown around all willy nilly like in this industry.  Every minor advancement is touted by its owner as a ‘Game Changer.’  In reality ‘Game Changers’ are few and far between.  In order to qualify you must actually shift the status quo, not just improve upon it.  To use vacuums as an example, if your vacuum sucks harder it just sucks harder, it doesn’t change the game.  A Dyson vacuum may vacuum better than anyone else’s but Roomba (http://www.irobot.com/uk/home_robots.cfm) is the one that changed the game.  With Dyson I still have to push the damn thing around the living room, with Roomba I watch it go.

In order to understand why UCS changes the game rather than improving upon it, you first need to define UCS:

UCS is NOT a blade system it is a server architecture

Cisco’s unified Computing System (UCS) is not all about blades, it is about rack mount servers, blade servers, and management being used as a flexible pool of computing resources.  Because of this it has been likened to an x86-64 based mainframe system.

UCS takes a different approach to the original blade system designs.  It’s not a solution for data center point problems (power, cooling, management, space, cabling) in isolation it’s a redefinition of the way we do computing.

‘Instead of asking how can I improve upon current architectures’

Cisco/Nuova asked

What’s the purpose of the server and what’s the best way to accomplish that goal.’

Many of the ideas UCS utilizes have been tried and implemented in other products before: Unified I/O, single point of management, modular scalability, etc., but never all in one cohesive design.

There are two major features of UCS that I call ‘the cake’ and three more that are really icing.  The two cake features are the reason UCS is my A-Game and the others just further separate it.

  • Unified Management
  • Workload Portability

Unified Management:

Blade architectures are traditionally built with space savings as a primary concern.  In order to do this a blade chassis is built with a shared LAN, SAN, power, cooling infrastructure and an onboard management system to control server hardware access, fan speeds, power levels, etc.  M. Sean McGee describes this much better than I could hope to in his article The “Mini-Rack” approach to Blade Design (http://bit.ly/bYJVJM.)  This traditional design saves space and can also save on overall power, cooling, and cabling but causes pain points in management among other considerations.

UCS was built from the ground up with a different approach, and Cisco has the advantage of zero legacy server investment which allows them to execute on this.  The UCS approach is:

  • Top-of-Rack networking should be Top-Of-Rack not repeated in each blade chassis.
  • Management should encompass the entire architecture not just a single chassis.
  • Blades are only 40% of the data center server picture, rack mounts should not be excluded.
    The UCS Approach

    image

The key difference here is that all management of the LAN, SAN, server hardware, and chassis itself is pulled into the access layer and performed on the UCS Fabric Interconnect which provides all of the switching and management functionality for the system.  The system itself was built from the ground up with this in mind, and as such this is designed into each hardware component.  Other systems that provide a single point of management do so by layering on additional hardware and software components in order to manage underlying component managers.  Additionally these other systems only manage blade enclosures while UCS is designed to manage both blades and traditional rack mounts from one point.  This functionality will be available in firmware by the end of CY10.

To put this in perspective Cisco UCS provides a very similar rapid repeatable physical server deployment model to the virtual server deployment model VMware provides.  Through the use of granular Role Based Access Control (RBAC) UCS ensures that organizational changes are not required, while at the same time providing the flexibility to streamline people and process if desired.

Workload Portability:

Workload portability has significant benefits within the data center, the concept itself is usually described as ‘statelessness.’  If you’re familiar with VMware this is the same flexibility VMware provides for virtual machines, i.e. there is no tie to the underlying hardware. One of the key benefits of UCS is the ability to apply this type of statelessness at the hardware level.  This removes the tie of the server or workload to the blade or slot it resides in, and provides major flexibility to maintenance and repair cycles, as well as deployment times for new or expanding applications.

Within UCS all management is performed on the Fabric Interconnect through the UCS Manager GUI or CLI.  This includes any network configuration for blades, chassis, or rack-mounts, all server configuration including firmware BIOS, NIC/HBA and boot order among other things.  The actual blade is configured through an object called a ‘service profile’.’  This profile defines the server on the network as well as the way in which the server hardware operates (BIOS/Firmware, etc.)

All of the settings contained within a server profile are traditionally configured, managed and stored in hardware on a server.  Because these are now defined in a configuration file the underlying hardware tie is stripped away and a server workload can be quickly moved from one physical blade to another without requiring changes in the networks, or storage arrays.  This decreases maintenance windows and speeds roll-out.

Within UCS, Service Profiles can be created using templates or pools which is unique to UCS.  This further increases the benefits of service profiles and decreases the risk inherent with multiple configuration points, and case-by-case deployment models.

UCS Profiles and Templates

image

These two features and their real world applications and value are what place UCS in my A-Game slot.  These features will provide benefits to ANY server deployment model, and are unique to UCS.  While subcomponents exist within other vendors they are not:

  • Designed into the hardware
  • Fully integrated without the need for additional hardware and software and licensing
  • As robust

Icing on the cake:

  • Dual socket server memory scalability and flexibility (Cisco memory expander technology)
  • Integration with VMware and advanced networking for virtual switching
  • Unified fabric (I/O consolidation)

Each of these feature also offer real world benefits but the real heart of UCS is the Unified management and server statelessness.  You can find more information on these other features through var
ious blogs and Cisco documentation.

When is it time for my B-Game?:

By now you should have an understanding as to why I chose UCS as my A-Game (not to say you necessarily agree, but that you understand my approach.)  So what are the factors that move me towards my B-Game?  I will list three considerations and the qualifying question that would finalize a decision to position a B-Game system:

Infiniband If the customer is using Infiniband for networking UCS does not currently support it.  I would first assess whether there was an actual requirement for Infiniband or if it was just the best option at the time of last refresh.  If Infiniband is required I would move to another solution.
Non-Intel Processors Requirement for non-Intel processors would steer me towards another vendor as UCS does not currently support non-Intel.  As above I would first verify whether non-Intel was a requirement or a choice.
Requirement for chassis based storage If a customer had a requirement for chassis based storage there is no current Cisco offering for this within UCS.  This is however very much a corner case and only a configuration I would typically recommend for single chassis deployments with little need to scale.  In-chassis storage becomes a bottle neck rather than a benefit in multi-chassis configurations.

While there are other reasons I may have to look at another product for a given engagement they are typically few and far between.  UCS has the right combination of entry point and scalability to hit a great majority of server deployments.  Additionally as a newer architecture there is no concern with the architectural refresh cycle of other vendors.  As other blade solutions continue to age there will be an increased risk to the customer in regards to forward compatibility.

Summary:

UCS is not the only server or blade system on the market, but it is the only complete server architecture.  Call it consolidated, unified, virtualized, whatever but there isn’t another platform to combine rack-mounts and blades under a single architecture with a single management window and tools for rapid deployment.  The current offering is appropriate for a great majority of deployments and will continue to get better.

If your considering a server refresh or new deployment it would be a mistake not to take a good look at the UCS architecture.  Even if it’s not what you choose it may give you some ideas as to how you want to move forward, or features to ask your chosen vendor for.

Even if you never buy a UCS server you can still thank Cisco for launching UCS.  The lower pricing you’re getting today, and the features being put in place on other vendors product lines are being driven by a new server player in the market, and the innovation they launched with.

Comments, concerns, complaints always appreciated!

GD Star Rating
loading...

Virtualization

While not a new concept virtualization has hit the main stream over the last few years and become a uncontrollable buzz word driven by VMware, and other server virtualization platforms.  Virtualization has been around in many forms for much longer than some realizes, things like Logical partitions (LPAR) on IBM Mainframes have been around since the 80’s and have been extended to other non-mainframe platforms.  Networks have been virtualized by creating VLANs for years.  The virtualization term now gets used for all sorts of things in the data center.  like it or love the term doesn’t look like it’s going away anytime soon.

Virtualization in all of its forms is a pillar of Cloud Computing especially in the private/internal cloud architecture.  To define it loosely for the purpose of this discussion let’s use ‘The ability to divide a single hardware device or infrastructure into separate logical components.

Virtualization is key to building cloud based architectures because it allows greater flexibility and utilization of the underlying equipment.  Rather than requiring  separate physical equipment for each ‘Tenant’ multiple tenants can be separated logically on a single underlying infrastructure.  This concept is also known as ‘multi-tenancy.’  Depending on the infrastructure being designed a tenant can be an individual application, internal team/department, or external customer.  There are three areas to focus on when discussing a migration to cloud computing, servers, network, and storage.

Server Virtualization:

Within the x86 server platform (typically the Windows/Linux environment.) VMware is the current server virtualization leader.  Many competitors exist such as Microsoft’s HyperV and Zen for Linux, and they are continually gaining market share.  The most common server virtualization allows a single physical server to be divided into logical subsets by creating virtual hardware, this virtual hardware can then have an Operating System and application suite installed and will operate as if it were an independent server.  Server virtualization comes in two major flavors, Bare metal virtualization and OS based virtualization.

Bare metal virtualization means that a lightweight virtualization capable operating system is installed directly on the server hardware and provides the functionality to create Virtual Servers.  OS based virtualization operates as an application or service within an OS such as Microsoft Windows that provides the ability to create virtual servers.  While both methods are commonly used Bare Metal virtualization is typically preferred for production use due to the reduced overhead involved.

Server virtualization provides many benefits but the key benefits to cloud environments are: increased server utilization, and operational flexibility.  Increased utilization means that less hardware is required to perform the same computing tasks which reduces overall cost.  The increased flexibility of virtual environments is key to cloud architectures.  When a new application needs to be brought online it can be done without procuring new hardware, and equally as important when an application is decommissioned the physical resources are automatically available for use without server repurposing.  Physical servers can be added seamlessly when capacity requirements increase.

Network Virtualization:

Network virtualization comes in many forms.  VLANs, LSANs, VSANs allow a single physical  LAN or SAN architecture to be carved up into separate networks without dependence on the physical connection.  Virtual Routing and Forwarding (VRF) allows separate routing tables to be used on a single piece of hardware to support different routes for different purposes.  Additionally technologies exist which allow single network hardware components to be virtualized in a similar fashion to what VMware does on servers.  All of these tools can be used together to provide the proper underlying architecture for cloud computing.  The benefits of network virtualization are very similar to server virtualization, increased utilization and flexibility.

Storage Virtualization:

Storage virtualization encompasses a broad range of topics and features.  The term has been used to define anything from the underlying RAID configuring and partitioning of the disk to things like IBMs SVC, and NetApp’s V-Series both used for managing heterogeneous storage.  Without getting into what’s right and wrong when talking about storage virtualization, let’s look at what is required for cloud.

First consolidated storage itself is a big part of cloud infrastructures in most applications.  Having the data in one place to manage can simplify the infrastructure, but also increases the feature set especially when virtualizing servers.  At a top-level looking at storage for cloud environments there are two major considerations: flexibility and cost.  The storage should have the right feature set and protocol options to support the initial design goals, it should also offer the flexibility to adapt as the business requirements change.  Several vendors offer great storage platforms for cloud environments depending on the design goals and requirements.  Features that are typically useful for the cloud (and sometimes lumped into virtualization) are:

De-Duplication – Maintaining a single copy of duplicate data, reducing overall disk usage.

Thin-provisioning – Optimizes disk usage by allowing disks to be assigned to servers/applications based on predicted growth while consuming only the used space.  Allows for applications to grow without pre-consuming disk.

Snapshots – Low disk use point in time record which can be used in operations like point-in-time restores.

Overall virtualization from end-to-end is the foundation of cloud environments, allowing for flexible high utilization infrastructures.

GD Star Rating
loading...