Why Oracle’s 72 port 10GE switch doesn’t matter

I recently ran into some internal buzz about Oracle’s 72 port ‘top-of-rack’ switch announcement and it peeked my interest, so I started taking a look.  Oracle selling a switch is definitely interesting on the surface but then again they did just purchase Sun for a bargain basement price and Sun does make hardware, pretty good hardware at that.  Here is a quick breakdown of the switch:

Size 1RU
Port Count 72x 10GE or 16x 40GE
Oversubscription None fully non-blocking
L3 routing Yes
Price $79,200 list

Two three letter words came to mind when I saw this: wow, and why.  Wow is definitely in order, I mean wow!  Packing 72 non-blocking 10GE ports into a 1RU switch chassis is impressive, very impressive.  I’m dying to get a look at the hardware.  Now for the why:

Why does Oracle think they can call a 72 port switch a top-of-rack switch?  1RU form factor doth not a ToR make.  Do you have 72 10GE ports in a rack in your data center?  This switch is really a middle-of-row or end-of-row switch.  Once you move it into that position now you’ve got some cabling to think about, $1000.00 or so times 2 per link for optics another couple hundred for that nice long cable x 72, the cost of running and maintaining those cables… think ‘Holy shit Batman my $79,200 ToR switch just became a $200,000+ EoR switch and a different management model from the rest of my shop.’

Why does Oracle think there is a need for full non-blocking bandwidth for every access layer port?  Is anyone seriously driving sustained 10GE on multiple devices at once, anyone?  You’ve got two options in switching and only one actually makes sense.  You either reduce cost and implement oversubscription in hardware, or you pay for full rate hardware that is still oversubscribed in you network designs because you aren’t using 1:1 server to inter-switch links.  Before deciding how much you really need line-rate bandwidth do yourself a favor and take a look at your I/O profile across a few servers for a week or two.  If you’re like the majority of data centers you’ll find that you’ll be quite fine with as much as 8:1 or higher oversubscription with 10GE at the access layer.

Why would I want to buy a 10GE switch today that has no support for DCB or FCoE?  Whether you like it or not FCoE is here, both Cisco and HP are backing it strongly with products shipping and more on the way.  Emulex and Qlogic are both in their second generation of Converged Network Adapters (CNA) see my take on Emulex’s known as the OneConnect adapter (http://www.definethecloud.net/?p=382.)  The standards are all ratified and even TRILL is soon to be ratified to provide that beautiful Spanning-Tree free utopian network you’ve dreamed of since childhood.  If I’m an all NFS or iSCSI shop maybe this doesn’t bother me but if I’m running Fibre Channel there is no way I’m locking myself into 10GE at the access layer without IEEE standard DCB and FCoE capabilities in the hardware.

What it really comes down to is that this switch is meaningless in the average enterprise data center.  Where this switch fits and has purpose is in specialized multi-rack appliances and clusters.  If you buy an Oracle multi-rack system or cluster from Oracle this will be one option for connectivity.  With any luck they won’t force you into this switch because there are better options.

Thanks to my colleague for helping me out with some of this info.

Kudos: I do want to give Oracle kudos on the QSPF which is the heart of how they were able to put 72 10GE ports in a 1RU design.  The QSPF is a 40GE port that can optionally be split into 4 individual 10GE links. It’s definitely a very cool concept and will hopefully see greater industry adoption.

How to build the 10GE network of your dreams:

One of the things I love about the Oracle 10GE switch is that it highlights exactly what Cisco is working to fix in data center networking with the Nexus 5000 and 2000.

Note: Full disclosure and all that jazz, I work for a Cisco reseller and as part of my role I work closely with Cisco Nexus products.  That being said I chose the role I’m in (and the role chose me) because I’m a big fan and endorser of those products not the other way around.  To put it simply, I love the Nexus product line because I love the Nexus product line, I just so happen to be lucky enough to have a job doing what I love.

So now stepping off my soapbox and out of disclosure mode let’s get to the what the hell is Joe talking about portion of this post.


In the diagram above I’m showing two Nexus 5020’s in green at the top and 10 pairs of Nexus 2232’s connected to them.  What this creates is a redundant 320 port 10GE fabric with 2 points of management because the Nexus 2000 is just a remote line card of the Nexus 5000.  All of this comes with two other great features: latency under 5us and FCoE support.  Additionally this puts a 2K at the top of each rack allowing ToR cabling while keeping all management and administration at the 5K in the middle-of-row.  Because the system also supports Twinax cabling there is a cost savings of thousands of dollars per rack over Fibre cabling to TOR or EoR.  There is not another solution on the market that comes close to this today.  All of this at a 4:1 oversubscription rate at the access layer.  If you’re willing to oversubscribe a little more you could actually add 2 more redundant Nexus 2000s for another 64 ports capping at 384 ports.

This entire solution comes in at or below the price of 2 of Oracle’s switches before considering the cost savings on cabling.


I don’t believe Oracle’s 72 port switch has a market in the average data center.  It will have specialized use cases, and it is quite an interesting play.  The best thing it has to offer is the QSPF which hopefully will gain some buzz and vendor support thanks to Oracle.

GD Star Rating

Data Center 101: Local Area Network Switching

Interestingly enough 2 years ago I couldn’t even begin to post an intelligent blog on Local Area Networking 101, funny how things change.  That being said I make no guarantees that this post will be intelligent in any way.  Without further ado let’s get into the second part of the Data Center 101 series and discuss the LAN.

I find the best way to understand a technology is to have a grasp on its history and the problems it solves, so let’s take a minute to dive into the history of the LAN.  For the sake of simplicity and real world applicability I’m going to stick to Ethernet as it is the predominant LAN technology in today’s data center environments.  Before we even go into the history we’ll define Ethernet and where it fits on the OSI model.


Ethernet is a frame based networking technology which is comprised as a set of standards for Layer 1 and 2 of the OSI model.  Ethernet devices use a address called a Media-Access Control Address (MAC) for communication.  MAC addresses are a flat address space which is not routable (can only be used on a flat layer 2 network) and is composed of several components most importantly a vendor ID known as an Organizational Unique Identifier (OUI) and a unique address for the individual port.

OSI Model:

The Open-Systems Interconnection (OSI) model is a sub-division of the components of communication that is used as a tool to create interoperable network systems and is a fantastic model for learning networks.  The OSI model breaks into 7-Layers much like my favorite taco dip.

image Understanding the OSI model and where protocols and hardware fit into it will not only help you learn but also help with understanding new technologies and how they fit together.   I often revert back to placing concepts in terms of the OSI model when having highly technical discussions about new concepts and technology.  The beauty of the model is that it allows for easy interoperability and flexibility.  For instance Ethernet is still Ethernet whether you use Fiber cables or copper cables because only Layer 1 is changing.

Ethernet LAN History:

As the LAN networks we use today evolved they typically started with individual groups within an organization.  For instance a particular group would have a requirement for a database server and would purchase a device to connect that group.  Those devices were commonly a hub.


A network hub is a device with multiple ports used to connect several devices for the purposes of network communication.  When an Ethernet hub receives a frame it replicates it to all connected ports except the one it received it on in a process called flooding.  All connected devices receive a copy of the frame and will typically only process the frame if the destination MAC address is their own (there are exceptions to this which are beyond the scope of this discussion.)


In the diagram above you see a single device sending a frame and that frame being flooded to all other active ports.  This works quite well for small networks consisting of a single hub and low port count, but you can easily see where problems start to arise as the network grows.


Once multiple hubs are connected and the network grows each hub will flood every frame, and all devices will receive these frames regardless of whether they are the intended recipient.  This causes major overhead in the network due to the unneeded frames consuming bandwidth.


The next step in the network evolution is called bridging and was designed to alleviate this problem and decrease the overhead of forwarding unneeded frames.  A bridge is a device that makes an intelligent decision on when and where to flood frames based on MAC addresses stored in a table.  These MAC addresses can be static (manually input) or dynamic (learned on the fly.)  Because it is more common we will focus on dynamic.  The original bridges were typically 2 or more ports (low port counts) and could separate MAC addresses using the table for those ports.


In the above diagram you see a hub operating normally on the left flooding the frame to all active ports.  When the frame is received by the bridge a MAC address lookup is done on the MAC table and the bridge makes a decision whether or not to flood to the other side of the network.  Because the frame in this example is destined for a MAC address existing on the left side of the network the bridge does not flood the frame.  These addresses will be learned dynamically as devices send frames.  If the destination MAC address had been a device on the right side of the network the bridge would have sent the frame to that side to be flooded by the hub.

Bridges reduced unnecessary network traffic between groups or departments while allowing resource sharing when needed.  The limitation of original bridges came from the low port counts and changing data patterns.  Because the bridges were typically only separating 2-4 networks there was still quite a bit of flooding, especially when more and more resources were shared across groups.


Switches are the next evolution of bridges and the operation they perform is still considered bridging.  In very basic terms a switch is a high port-count bridge that is able to make decisions on a port-by-port basis.  A switch maintains a MAC table and only forwards frames to the appropriate port based on the destination MAC.  If the switch has not yet learned the destination MAC it will flood the frame.  Switches and bridges will also flood multi-cast (traffic destined for multiple recipients) and broadcast (traffic destined for all recipients) frames which are beyond the scope of this discussion.


In the diagram above I have added several components to clarify switching operations now that we are familiar with basic bridging. Starting in the top left of the diagram you see some of the information that is contained in the header of an Ethernet frame.  In this case it is the source and destination MAC addresses of two of the devices connected to the switch.  Each end-point in the above diagram is labeled with a MAC address starting with AF:AF:AF:AF:AF.  In the top right we see a representation of a MAC table which is stored on the switch and learned dynamically.  The MAC table contains a listing of which MAC addresses are known to be on each port.  Because the MAC table in this example is fully populated we can assume that the switch has previously seen a frame from each device in order to populate the table.  That auto population is the ‘dynamic learning’ and it is done be recording the source MAC address of incoming frames.  Lastly we see that the frame being sent by the device on port 1 is only being forwarded to the device on port 2.  In the event port 2’s MAC address had not yet been learned the switch would be forced to flood the frame to all ports except the one it received it on in order to ensure it was received by the destination device.

So far we’ve learned that bridges improved upon hubs, and switches improved upon basic bridging.  The next kink in the evolution of Ethernet LANs came as our networks grew beyond single switches and we began adding in redundancy.

The three issues that arose can all be grouped as problems with network loops (specifically Layer 2 Ethernet loops.)  These issues are:

Multiple Frame Copies:

When a device receives the same frame more than once due to replication or loop issues it is a multiple frame copy.  This can cause issues for some hardware and software and also consumes additional unnecessary bandwidth.

MAC Address Instability:

When a switch must repeatedly change its MAC table entry for a given device this is considered MAC address instability.

Broadcast Storms:

Broadcast storms are the most serious of the three issues as they can literally bring all traffic to a halt.  If you ask someone who has been doing networking for quite some time how they troubleshoot a broadcast storm you are quite likely to hear ‘Unplug everything and plug things back in one at a time until you find the offending device.’  The reason for this is that in the past the storm itself would soak up all available bandwidth leaving no means to access switching equipment in order to troubleshoot the issue.  Most major vendors now provide protection against this level of problem but storms are still a serious problem that can have a major performance impact on production data.  Broadcast storms are caused when a broadcast, multi-cast or flooded frame is repeatedly forwarded and replicated by one or more switches.

 image In the diagram above we can see a switched loop.  We can also observe several stages of frame forwarding starting with the device 1 in the top left sending a frame to the device 2 in the top right.

  1. Device 1 forwards a frame to device 2.  This one-to-one communication is known as unicast.
  2. The switch on the top left does not yet have device 2 in its MAC table therefore it is forced to flood the frame, meaning replicate the frame to all ports except the one where it was received. 
  3. In stage three we see two separate things occur:
    1. The switch in the top right delivers the frame to the intended device (for simplicities sake we are assuming the switch in the top right already has a MAC table entry for the device.)
    2. The bottom switch having received the frame forwards the frame to the switch in the top right.
  4. The switch in the top right receives the second copy and forwards it based on MAC table delivering the second copy of the same frame to device 2.



The above example has a little more going on and can become confusing quickly.  For the purposes of this example assume all three switches have blank MAC address tables with no devices known.  Also remember that they are building the MAC table dynamically based on the source MAC address they see in a frame.  To aid in understanding I will fill out the MAC tables at each step.

1. Our first stage is the easy one.  Device 1 forwards a unicast frame to device 2.  Switch A receives this frame on the top port.


2. When switch A receives the frame it checks its MAC table for the correct port to forward frames to device 2.  Because its MAC table is currently blank it must flood the frame (replicate it to all ports except the one where it was received.)  As it floods the frame it also records the MAC address and attached port of device 1 because it has seen this MAC as the source in the frame.


3. In stage 3 two switches receive the frame and must make decisions. 

  1. Switch C having a blank MAC table must flood the frame.  Because there is only one port other than the one it received it on switch C floods the frame to the only available port, at the same time it records the source MAC address as having been received on its port 1.  
  2. Switch B also receives the frame from switch A, and must make a decision.  Like switch C, switch B has no records in its MAC table and must flood the frame.  It floods the frame down to switch B, and up to device 2.  At the same time switch B records the source MAC in its MAC table.


4. In the fourth stage we again have several things happening. 

  1. Switch C has received the same frame for the second time, this time from port 2.  Because it still has not seen the destination device it must flood the frame.  Additionally because this is the exact same frame switch C sees the MAC address of device 1 coming from its right port, port 2, and assumes the device has moved.  This forces switch C to change it’s MAC table. 
  2. At the same time Switch B receives another copy of the frame.  Switch B seeing the same source address must change its MAC table and because it still does not have the destination MAC in the table it must flood the frame again.


In the above diagram pay close attention to the fact that the MAC tables have been changed for switch B and C.  Because they saw the same frame come from a different port they must assume the device has moved and change the table.  Additionally because the cycle has not been completed the loop will continue and this is one way broadcast storms begin.  More and more of these endless loops hit the network until there is no bandwidth left to serve data frames. 

In this simple example it may seem that the easy solution is to not build loops like the triangle in my diagram.  This is actually the premise of the next Ethernet evolution we’ll discuss, but first let’s look at how easy it is to create loops just by adding redundancy.


In the diagram above we start with a non-redundant switch link.  This link is a single point of failure and in the event a component fails devices on separate switches will be unable to communicate.  The simple solution is adding a second port for redundancy, with the assumed added benefit of having more bandwidth.  In reality without another mechanism in place adding the second link turns the physical view on the bottom left into the logical view on the bottom right which is loop.  This is where the next evolution comes into play.

Spanning-Tree Protocol (STP):

STP is defined in IEEE 802.1d and provides an automated method for building loop free topologies based on a very simple algorithm.  The premise is to allow the switches to automatically configure a loop free topology by placing redundant links in a blocked state.  Like a tree this loop free topology is built up from the root (root bridge) and branches out (switches) to to the leaves (end-nodes) with only one path to get to each end-node.

imageThe way Spanning-tree does this is by detecting redundant links and placing them in a ‘blocked’ state.  This means that the ports do not send or receive frames. In the event of a primary link failure (designated port) the blocked port is brought online.  The issue with spanning-tree is two fold:

  • Because it blocks ports to prevent loops potential bandwidth is wasted.
  • In failure events Spanning-Tree can take up to 50 seconds to bring the blocked port into an active state, this means there is a potential of 50 seconds of down time for the link.

Multiple versions of STP have been implemented and standardized to improve upon the original 802.1d specification.  These include:

Per-VLAN Spanning-Tree Protocol (PVSTP):

Performs the blocking algorithm independently for each VLAN allowing greater bandwidth utilization.

Rapid Spanning-Tree Protocol (RSTP):

Uses additional port-types not in the original STP specification to allow faster convergence during failure events.

Per-VLAN Rapid Spanning-Tree (PVRSTP):

Provides rapid spanning-tree functionality on a per VLAN basis.

Other STP implementations exist and the details of STP operation in each of its flavors is beyond the scope of what I intend to cover with the 101 series.  If there is a demand these concepts may be covered in a more in-depth 202 series once this series is completed.


Ethernet networking has evolved quite a bit over the years and is still a work in progress.  Understanding the how’s and why’s of where we are today will help in understanding the advancements that continue to come.  If you have any comments, questions, or corrections please leave them in the comments or contact me in any of the ways listed on the about page.

GD Star Rating

How Emulex Broke Out of the ‘Card Pusher’ Box

A few years back when my primary responsibility was architecting server, blade, SAN, and virtualization solutions for customers I selected the appropriate HBA based on the following rule: Whichever (Qlogic or Emulex) is less expensive today through the server OEM I’m using.  I had no technical or personal preference for one or the other.  They were both stable, performed, and allowed my customers to do what they needed to do.  On any given day one might show higher performance than another but that’s always subject to the testing criteria and will be fairly irrelevant for a great deal of customers.  At that point I considered them both ‘Card Pushers.’

Last year I had the opportunity to speak at two Emulex Partner product launch events in the UK and Germany.  My presentation was a vendor independent technical discussion on the drivers for consolidating disparate networks on 10GE and above.  I had no prior knowledge of the exact nature of the product being launched, and didn’t expect anything more than a Gen 2 single chip CNA, nothing to get excited over.  I was wrong.

Sitting through the Key Note presentations by Emulex executives I quickly realized OneConnect was something totally different, and with it Emulex was doing two things:

  1. Betting the farm on Ethernet
  2. Rebranding themselves as more than just a card pusher.

Now just to get this out of the way Emulex did not, has not, and to my knowledge will not stop pursuing better and faster FC technology, their 4GB and 8GB FC HBAs are still rock solid high performance pure FC cards.  What they were however doing is obviously placing a large bet (and R&D investment) on Ethernet as a whole.


The Emulex OneConnect is a Generation 2 Converged Network Adapter (CNA), but it’s a lot more than that.  It also does TCP offload, operates as an iSCSI HBA, and handles FCoE including the full suite of DCB standards.  It’s the Baskin Robins of of I/O interface cards, although admittedly  no FCoTR support 😉 (http://www.definethecloud.net/?p=380)  The technology behind the card impressed me but the licensing model is what makes it matter.  With all that technology built into the hardware you’d expect a nice hefty price tag to go with it.  That’s not the case with the OneConnect card, the licensing options allow you to buy the card at a cost equivalent to competing 10GE NICs and license iSCSI or FCoE if/when desired (licensing models may vary with OEMs.)  This means Emulex, a Fibre Channel HBA vendor, is happy to sell you a high performance 10GE NIC.  In IT there is never one tool for every job, but as far as I/O cards go this one comes close.

You don’t have to take my word for it when it comes to how good this card is, HP’s decision to integrate it into blade and rack mount system boards speaks volumes.  Take a look at Thomas Jones post on the Emulex Federal Blog for more info (http://www.emulex.com/blogs/federal/2010/07/13/the-little-trophy-that-meant-a-lot/.)  Additionally Cisco is shipping OneConnect options for UCS blades and rack mounts, and IBM also OEMs the product.

In addition to the OneConnect launch Emulex has also driven to expand their market into other areas, products like OneCommand Vision promise to provide better network I/O monitoring and management tools, and are uniquely positioned to do this through the eyes of the OneConnect adapter which can see all networks connected to the server.


Overall Emulex has truly moved outside of the ‘Card Pusher’ box and uniquely positioned themselves above their peers.  In an data center market where many traditional Fibre Channel vendors are clinging to pure FC like a sinking ship Emulex has embraced 10GE and offers a product that lets the customer choose the consolidation method or methods that work for them.

GD Star Rating

FCoTR a Storage Revolution

As the industry has rapidly standardized and pushed adoption of Fibre Channel over Ethernet (FCoE) there continue to be many skeptics.  Many Fibre Channel gurus balk at the idea of Ethernet being capable of guaranteeing the right level of lossless delivery and performance required for the SCSI data their disks need.  IP Junkies like Greg Ferro (http://etherealmind.com/) balk at the idea of changing Ethernet in any way and insist that IP can solve all the worlds problems including world hunger (Sally Struthers over IP SSoIP.)  Additionally there is a fear from some storage professionals of having to learn Ethernet networks or being displaced by their Network counterparts.

In steps Fibre Channel over Token Ring (FCoTR.)  FCoTR promises to provide collisionless delivery using proven Token Ring networks.  FCoTR is proposed by industry recognized experts: E. Banks, K. Houston, S. Foskett, R. Plankers and W. C. Preston to solve the issues mentioned above and provide a network that can converge Fibre Channel onto Token Ring while maintaining the purity of IP and providing job protection to storage administrators.  FCoTR is synergistic network convergence for Data Center 3.0 and Cloud Computing.

FCoTR has taken the fast track into the public eye and will be interesting to watch as it evolves.  If IBM plays their card rights they may be able to ride this wave into displacing Cisco and regaining their dominance in that space.  For more information on FCoTR:

GD Star Rating

The Art of Pre-Sales

On a recent customer call being led by a vendor account manager and engineer I witnessed some key mistakes by the engineer as he presented the technology to the customer.  None of the mistakes were glaring or show stopping but they definitely kept the conversation from having the value that was potentially there.  That conversation got me thinking about the skills and principles that need to be applied to pre-sales engineering and prompted this blog.

Pre-sales engineering in all of its many forms is truly an art.  There is definitely science and methodologies behind its success but practicing those methods and studying that science alone won’t get you far past good.  To be great you need to invest effort into the technology, the business, and most importantly you’re personal style.  If you’re already good at pre-sales and don’t care to be great than the rest of this blog won’t help you.  If you’re an ‘end-user’ or customer that deals with pre-sales engineers this blog may help you understand a little of what goes through the heads of the guys on the other side of the conference table.  If your job is post-sales, implementations, managed-services, etc this may give you an idea of what your counterparts are doing.  If you’re a pre-sales engineer who could use some new ideas or tools, this blogs for you.

Joe’s 5 rules of Pre-Sales Engineering:

  • You are a member of the sales team
  • You are not a salesperson
  • You must be Business Relevant
  • You must be Technically Knowledgeable
  • Know your audience

These are really rules of thumb that I use to get into the right mindset when engaging with customer’s in a pre-sales fashion.  They aren’t set in stone, all encompassing or agreed upon by teams of experts, just tools I use.  Let’s start with a quick look into each rule:

You are a member of the sales team:

This one is key to remember because for a lot of very technical people that move into pre-sales roles this is tough to grasp.  There is not always love, drum circles, group hugs and special brownies between sales and engineering and some engineers tend to resent sales people for various reasons (and vice versa.)  Whether or not there is resentment it’s natural to be proud of your technical skill set and thinking of yourself in a sales perspective may not be something your comfortable with.  Get over it or get out of pre-sales.  As a pre-sales engineer it’s your job to act as a member of the sales team assisting account managers in the sale of the products and services your company provides.  You are there to drive the sales that provide the blanket of revenue the rest of the company rises and sleeps under (if you missed that reference watch the video, it’s worth it: http://bit.ly/dqTzU7.)

You are not a salesman:

Now that you’ve swallowed the fact that you’re a member of the sales team it’s time to enforce the fact that you are not an account manager/sales representative etc.  This is vitally important, in fact if you can apply only the first two rules you’ll be significantly better than some of your peers.  I’m going to use the term AM (Account Manager) for sales from here on out, allow this to encompass any non-technical sales title that fits your role.  An AM and a pre-sales SE are completely different roles with a common goal.  An AM is tightly tied to a target sales number and most likely spends hours on con calls talking about that number and why they are or aren’t at that number.  An AMs core job is to maintain customer relationships and sell what the company sells.

A pre-sales engineers job on the other hand is a totally different beast.  While you do need to support your AM it’s your job to make sure that the product, service or solution you sell is relevant, effective, right-fit, and complete for the particular customer.  In the reseller world we talk about becoming a ‘Trusted Advisor’ but that ‘Trusted Advisor’ is typically a two person team consisting of an AM and Engineer who know the customer well, understand their environment, and maintain a mutually beneficial relationship.

As the engineer side of that perfect team it’s your job to have the IDEA:

  • Identify
  • Design
  • Evangelize
  • Adjust

Note: Before continuing I have to apologize for the fact that I just created one of those word acronym BS objects…

So what’s the bright IDEA?  A pre-sales engineer you need to identify customer requirements, design a product set or solution to meet those requirements, evangelize the proposed solution, and adjust the solution as necessary with the customer. 

You must be business relevant

This is typically another tough thing to do from an engineer standpoint.  Understanding business requirements and applying the technology to those requirements does not come naturally for most engineers but it is vital to success.  Great technology alone has no value, the data center landscape is littered with stories of great technology companies that failed because they couldn’t capitalize by making the technology business relevant.  The same lesson applies to pre-sales engineering.

To be a great pre-sales engineer you have to understand both business and technology enough to map the technical benefits to actual business requirements.  So what if your widget is faster than all other widgets before it, what does that mean to my business, and my job?  A great way to begin to understand the high level business requirements and what the executives of the companies you sell into are thinking is to incorporate business books and magazines into your reading.  Next time you’re at the airport magazine rack looking at the latest trade rag grab a copy of ‘The Harvard business Review’ instead.

You must be technically knowledgeable:

This part should go without saying but unfortunately is not always adhered to.  It’s way to often I see engineers reading from the slides they present because they don’t know the products or material they are presenting.  Maintaining an appropriate level of technical knowledge becomes harder and harder as more products are thrown at you, but you must do it anyway.   If you can’t speak to the product or solutions features and benefits without slides or data sheets you shouldn’t be speaking about it.

Staying up-to-date is a daunting task but there are a plethora of resources out there for it.  Blogs and twitter can be used as a constant stream of the latest and greatest technical information.  Add to that formal training and vendor documentation and the tools to be technically relevant are there.  The best advice I can offer on staying technically knowledgeable is not being afraid to ask and or say you don’t know.  If you need training ask for it, if you need info find someone who knows it and talk to them.  As importantly work to share your expertise with others as it creates a collaborative environment that benefits everyone.

Know your audience:

This may be the most important of the five rules and boils down to doing your homework and being applicable.  Ensure you’ve researched your customer, their requirements, and their environment as much as possible.  Know what their interests and pain points are before walking into a meeting whenever possible.

Knowing your audience also applies during customer meetings.  As the customer provides more information it’s important to tailor the information you provide to that customers interest on the fly.  Any technical conversation should be a fluid entity ebbing and flowing with the customers feedback.

Practicing the art:

Like any other art pre-sales must be practiced.  You must study the products and services your company sells, develop your presentation skills, and constantly work on your communication.  From my perspective the best way to build all of these skills at once is white boarding.  White boards are the greatest tool in a pre-sales engineers arsenal.  They provide a clean canvas on which you can paint the picture of a solution and remain fluid in any given conversation.  Unlike slides white board sessions are flexible and can easily stay focused on what the customer wants to hear.  I firmly believe that a pre-sales engineer should not discuss any technology they cannot confidently articulate via the whiteboard.  You cannot take this concept far enough, I’ve instructed 5 day data center classes 100% on the white board covering LAN, SAN, storage, servers and networking because it was the right fit for the audience.  The white board is your friend.

If you don’t have a white board in your home get one.  Use it to hone your skills, help visualize architecture, and practice before meetings.  Look through the slides you typically present and practice conveying the same messaging via the white board without cues.  As you become comfortable having technical discussions via the white board you’ll find you can convey a greater level of technical information tailored to the customers needs in a much faster fashion.  White boards also don’t require slides, projectors, or power, they don’t suffer from technical difficulties.

As you white board in front of customers think of painting a picture for them, start with broad strokes outlining the technology and add detail to areas that the customer shows interest in.  Drill down into only the specifics that are relevant to that customer, this is where knowing your audience is key.


In the diagram above you can see the way the conversation should go with a customer.  You begin at the top level big picture and drill down into only the points that the customer shows an interest in or are applicable to their data center and job role.  Don’t ever feel the need to discuss every feature of a product or solution because they are not all relevant to every customer.  For instance a server admin probably doesn’t care how fast switching occurs but network and application teams probably do.  Maybe your product can help save a ton of cost, great but that’s probably not very relevant to the administrators who aren’t responsible for budget.  Always ensure you’re maintaining relevance to the audience and the business.


Pre-Sales like any other skill set must be honed and practiced.  It doesn’t come overnight and as with anything else, you’re never as good as you can be.  Build a style and methodology that work for you and don’t be afraid to change or modify them as you find areas for improvement.  The better you get at the more value your giving your customer, team, and company.

GD Star Rating

Data Center 101: Server Systems

As the industry moves deeper and deeper into virtualization, automation, and cloud architectures it forces us as engineers to break free of our traditional silos.  For years many of us were able to do quite well being experts in one discipline with little to no knowledge in another.  Cloud computing, virtualization and other current technological and business initiatives are forcing us to branch out beyond out traditional knowledge set and understand more of the data center architecture as a whole.

It was this concept that gave me the idea to start a new series on the blog covering the foundation topics of each of the key areas of data center.  This will be lessons designed from the ground up to give you a familiarity with a new subject or refresh on an old one.  Depending on your background, some, none, or all of these may be useful to you.  As we get further through the series I will be looking for experts to post on subjects I’m not as familiar with, WAN and Security are two that come to mind.  If you’re interested in writing a beginners lesson in one of those topics, or any other please comment or contact me directly.

Server Systems:

As I’ve said before in previous posts the application is truly the heart of the data center.  Applications themselves are the reason we build servers, networks, and storage systems.  Applications are the email systems, databases, web content, etc that run our businesses.  Applications run within the confines of an operating system which interfaces directly with server hardware and firmware (discussed later) and provides a platform to run the application.  Operating systems come in many types, commonly Unix, Linux, and Windows with other variants used for specialized purposes such as mainframe and super computers.

Because the server itself sits more closely than any other hardware to the application understanding the server hardware and functionality is key.  Server hardware breaks down into several major components and concepts.  For this discussion we will stick with the more common AMD/Intel architectures known as the x86 architecture.

    System board (Mother Board) All components of a server connect via the system board.  The system board itself is a circuit board with specialized connectors for the server subcomponents.  The system board provides connectivity between each component of the server.
    Central Processing Unit (CPU) The CPU is the workhorse of the server system.  The CPU is performing the calculations that allow the operating system and application to run.  Whatever work is being done by an application is being processed by the CPU.  A CPU is placed in a socket on a system board.  Each socket can hold one CPU.
    Random Access Memory (RAM) Random Access memory is the place where data that is being used by the operating system and application but not currently being processed is stored.  For instance when you hear the term ‘load’ it typically refers to moving data from permanent storage or disk into memory where it can be accessed faster.  Memory is electronic and can be accessed very quickly, but it also requires active power to maintain data which is why it is known as being volatile.
    Disk Disk is a permanent storage media traditionally comprised of magnetic platters known as disks.  Other types of disks exist including Flash disks which provide much greater performance at a higher cost.  The key to disk storage is that it is non-volatile and does not require power to maintain data.

    Disk can either be internal to the server or external in a separate device.  Commonly server disk is consolidated in central storage arrays attached by a specialized network or network protocol.  Storage and storage networks will be discussed later in this series.

    Input/Output (I/O) Input/Output comprises the methods of getting data in and out of the server.  I/O comes in many shapes and sizes but two primary methods used in today’s data centers are Local Area Networks (LAN) using Ethernet as an underlying protocol, and Storage Area Networks (SAN) using Fibre Channel as the underlying protocol (both networks will be discussed later in this series.)  These networks attach to the server using I/O ports typically found on expansion cards.
    System bus The System bus is the series of paths that connect the CPU to the memory.  This will be specific to the CPU vendor. 
    I/O bus The I/O bus is the path that connects the expansion cards (I/O cards) to the CPU and memory.  Several standards exist for these connections allowing multiple vendors to interoperate without issue.  The most common bus type for modern servers is the PCI express or PCIe standard which supports greater bandwidth than previous bus types allowing for higher bandwidth networks to be used.
    Firmware Firmware is low-level software that is commonly hard-coded onto hardware chips.  Firmware runs the hardware device at a low level and interfaces with the BIOS.  In most modern server components the firmware can be updated through a process called ‘flashing.’
    Basic I/O System (BIOS) BIOS is a type of firmware stored in a chip on the system board.  The BIOS is the first code loaded when a server boots and is primarily responsible for initializing hardware and loading an operating system.



The diagram above shows a two socket server.  Starting at the bottom you can see the disks, in this case internal Hard Disk Drives (HDD.)  Moving up you can see two sets of memory and CPU followed by the I/O cards and power supplies.  The power supplies convert A/C current to appropriate D/C current levels for use in the system.  Additionally not shown would be fans to move air through the system for cooling.

The bus systems, which are not shown, would be a series of traces and chips on the system board allowing separate components to communicate.

A Quick Note About Processors:

Processors come in many shapes, sizes, and were traditionally rated by speed measures in hertz.  Over the last few years a new concept has been added to processors, and that is ‘cores.’  Simply put a core is a CPU placed on a chip beside other cores which each share certain components such as cache and memory controller (both outside the scope of this discussion.)  If a processor has 2 cores it will operate as if it was 2 physically independent identical processors and provide the advantages of such.

Another technology has been around for quite some time called hyper threading.  A processor can traditionally only process one calculation per cycle (measured in hertz) this is known as a thread.  Many of these processes only use a small portion of the processor itself leaving other portions idle.  Hyper threading allows a processor to schedule 2 processes in the same cycle as long as they don’t require overlapping portions of the processor.  For applications that are able to utilize multiple threads hyper threading will provide an average of approximately 30% increases whereas a second core would double performance.

Hyper threading and multiple cores can be used together as they are not mutually exclusive.  For instance in the diagram above if both installed processors were 4 core processors, that would provide 8 total cores, with hyper threading enabled it would provide a total of 16 logical cores.

Not all applications and operating systems can take advantage of multiple processors and cores, therefore it is not always advantageous to have more cores or processors.  Proper application sizing and tuning is required to properly match the number of cores to the task at hand.


Server Startup:

When a server is first powered on the BIOS is loaded from EEPROM (Electronically Erasable Programmable Read-Only Memory) located on the system board.  While the BIOS is in control it performs a series of Power On Self Tests (POST) ensuring the basic operability of the main system components.  From there it detects and initializes key components such as keyboard, video, mouse, etc.  Last the BIOS searches for a bootable device.  The BIOS searches through available bootable media for a device containing a bootable and valid Master Boot Record (MBR.)  It then loads this and allows that code to take over with the load of the operating system.

The order and devices the BIOS searches is configurable in the BIOS settings.  Typical boot devices are:

  • USB
  • Internal Disk
  • Internal Flash
  • Fibre Channel SAN

Boot order is very important when there is more than one available boot device, for instance when booting to a CD-ROM to perform recovery of an operating system that is installed.  It is also important to note that both iSCSI and Fibre Channel network connected disks are handled by the operating system as if they were internal Small Computer System Interface (SCSI) disks.  This becomes very important when configuring non-local boot devices.  SCSI as a whole will be covered during this series.

Operating System:

Once the BIOS is done getting things ready and has transferred control to the bootable data in the MBR that bootable data takes over.  That is called the operating system (OS.)  The OS is the interface between the user/administrator and the server hardware.  The OS provides a common platform for various applications to run on and handles the interface between those applications and the hardware.  In order to properly interface with hardware components the OS requires drivers for that hardware.  Essentially the drivers are an OS level set of software that allow any application running in the OS to properly interface with the firmware running on the hardware.


Applications come in many different forms to provide a wide variety of services.  Applications are the core of the data center and are typically the most difficult piece to understand.  Each application whether commercially available or custom built has unique requirements.  Different applications have different considerations for processor, memory, disk, and I/O.  These considerations become very important when looking at new architectures because any change in the data center can have significant effect on application performance.


The server architecture goes from the I/O inputs through the server hardware to the application stack.  Proper understanding of this architecture is vital to application performance and applications are the purpose of the data center.  Servers consist of a set of major components, CPU’s to process data, RAM to store data for fast access, I/O devices to get data in and out, and disk to store data in a permanent fashion.  This system is put together for the purpose of serving an application.

This post is the first in a series intended to build the foundation of data center.  If your starting from scratch they may all be useful, if your familiar in one or two aspects then pick and choose.  If this series becomes popular I may do a 202 series as a follow on.  If I missed something here, or made a mistake please comment.  Also if you’re a subject matter expert in a data center area that would like to contribute a foundation blog in this series please comment or contact me.

GD Star Rating

Have We Taken Data Redundancy too Far?

During a recent conversation about disk configuration and data redundancy on a storage array I began to think about everything we put into data redundancy.  The question that came to mind is the title of this post ‘Have we taken data redundancy too far?’

Now don’t get me wrong, I love data as much as the next fellow, and I definitely understand it’s importance to the business and to compliance. I’m not advocating tossing out redundancy or data protection, etc etc.  My question is when is enough enough, and or is there a better way?

To put this in perspective let’s take a look at everything that stacks up to protect enterprise data:


We start with the lowly disk, which by itself has no redundancy.  While disks themselves tend to have one of the highest failure rates in the data center they have definitely come a long way.  Many have the ability to self protect and warn of impending failure at a low level, and they can last for years without issue.

Disks alone are a single point-of-failure in which all data on the disk is lost if the drive fails.  Because of this we’ve worked to come up with better ways to apply redundancy to the data.  The simplest form of this is RAID.


RAID is ‘Redundant Array of Inexpensive Disks’, it’s also correct to say it ‘Redundant Array of Independent Disks.’  No matter what you call it RAID allows what would typically be a single disk on its own to act as part of a group of disks for the purposes of redundancy, performance or both.  You can think of this like disk clustering.

Some common RAID types used for redundancy are:

    • RAID 0 – Disk striping, data is striped across 2 disks to improve performance, each disk becomes a single point of failure for the entirety of the stored data.
    • RAID 1 – Disk mirroring, data is written to both disks simultaneously as an exact copy.  This method allows either disk to fail with no data loss. 
    • RAID 5 – Raid 5 is striping with parity.  What this mean is that using three or more disks all data is written in stripes across available disks and additional parity data is striped across the disks to provide redundancy.  Because of the parity data one disk can be lost from the group without data loss.  Raid 5 is N-1 capacity, meaning you lose 1 disk worth of space to the parity data.
    • RAID 6 – Disk striping with double parity.  Think RAID 5 with an extra disk lost for parity but the ability to lose two disks without data loss.  This is N-2 capacity.

In many cases ‘hot-spares’ will also be added to the RAID groups.  The purpose of a hot’-spare is to have a drive online but not participating in the RAID for failure events.  If a RAID disk fails the hot-spare can be used to replace it immediately until an administrator can swap the bad drive.


Another level of redundancy many enterprise storage arrays will use is snapshots.  Snapshots can be used to perform point-in-time recoveries.  Basically when a snapshot is taken it locks the associated blocks of data ensuring they are not modified without copying them.  If a block needs to be changed it will be written in a new location without effecting the original.  In order to revert to a snapshot the change data is simply removed leaving the original locked blocks.  While snapshots are not a backup or redundancy feature on their own they can be used as part of other systems, and are excellent for development environments where testing is required on various data sets, etc.  Snapshots consume additional space as two copies are kept of any locked block that is changed.

Primary/Secondary replication:

Another method for creating data redundancy is tiered storage.  In a tiered redundancy model the primary storage serving the applications is held on the highest performing disk and data is backed up or replicated to lower performance less expensive disk or disk arrays.

Virtual Tape Libraries (VTL):

Virtual tape libraries are storage arrays that present themselves as standard tape libraries for the purposes of backup and archiving.  VTL is typically used in between primary storage and actual tape backups as a means of decreasing the backup window.

Tape backups:

In most cases the last stop for backup and archiving is still tape.  This is because tape is cheap, high density, and ultra-portable.  Large amounts of data can be streamed to tape libraries which can store the data and allow tapes to be sent to off-site storage facilities.

Adding it up:

When you put these redundancy and recovery systems together and start layering them on top of one another you end up with high ratios of storage media being purposed for redundancy and recovery compared to the actual data being served.  10:1, 20:1, 100:1 or more is not uncommon when considering archive/redundancy space compared to usable space.


My summary is more of a repeat of the same question.  Have we taken this too far?  Do we need protection built in at each level, and layered on top of one another?  Can we afford to continue down this path adding redundancy at the expense of performance and utilization?  Should we throw higher parity RAID at our arrays and make up the performance hit with expensive cache?  Should we purchase 10TB of media for every 1TB we actually need to serve?  Is there a better way?

I don’t have the answer to this one, but would love to see a discussion on it.  The way I’m thinking now is bunches of dumb independent disk pooled and provisioned through software.  Drop the RAID and hot spares, use the software to maintain multiple local or global copies on different hardware.  When you start moving the disk thinking to cloud environments and talking about Petabytes or more of data the current model starts unraveling quickly.

GD Star Rating

Why Cloud is as ‘Green’ As It Gets

I stumbled across a document from Greenpeace citing cloud for additional power draws and the need for more renewable energy (http://www.greenpeace.org/international/en/publications/reports/make-it-green-cloud-computing/.)  This is one of a series I’ve been noticing from the organization bastardizing IT for its effect on the environment and chastising companies for new data centers.  These articles all strike a cord with me because they show a complete lack of understanding of what cloud is, does and will do on the whole especially where it concerns energy consumption and ‘green’ computing.

Greenpeace seams to be looking at cloud as additional hardware and data centers being built to serve more and more data.  While cloud is driving new equipment, new data centers and larger computing infrastructures it is doing so to consolidate computing overall.  Speaking of public cloud specifically there is nothing more green than moving to a fully cloud infrastructure.  It’s not about a company adding new services it’s about moving those services from underutilized internal systems onto highly optimized and utilized shared public infrastructure.

Another point they seem to be missing is the speed at which technology moves.  A state of the art data center built 5-6 years ago would be lucky to reach 1.5:1 Power Usage Effectiveness (PUE) whereas today’s state-of-the-art data centers can get to 1.2:1 or below.  This means that a new data center can potentially waste .3 or more KW less per processing KW than one built 5-6 years ago.  Whether that’s renewable energy or not is irrelevant, it’s a good thing.

The most efficient privately owned data centers moving forward will be ones built as private-cloud infrastructures that can utilize resources on demand, scale-up/scale-down instantly and automatically shift workloads during non-peak times to power off unneeded equipment.  Even the best of these won’t come close to the potential efficiency of public cloud offerings which can leverage the same advantages and gain exponential benefits by spreading them across hundreds of global customers maintaining high utilization rates around the clock and calendar year.

Greenpeace lashing out at cloud and focusing on pushes for renewable energy is naive and short sighted.  Several other factors go into thinking green with data center.  Power/Cooling are definitely key, but what about utilization?  Turning a server off during off peak times is great to save power but that still means the components of the computer had to be mined, shipped, assembled, packaged, and delivered to me in order to sit powered off 1/3 of the day when I don’t need the cycles.  That hardware will still be refreshed the same way at which point some of the components may be recycled and the rest will be non-biodegradable and sometimes harmful waste. 

Large data centers housing public clouds have the promise of overall reduced power and cooling with maximum utilization.  You have to look at the whole picture to really go green.

Greenpeace: While you’re out there casting stones at big data centers how about you publish some of your numbers?  Let’s see the power, cooling, utilization numbers for your computing/data centers, actual numbers not what you offset by sending a check to Al Gore’s bank account.  While you’re at it throw in the costs and damage created by your print advertisement (paper, ink, power) etc.  Give us a chance to see how green you are.

GD Star Rating