Data Center Analytics: A Can’t Live Without Technology

** There is a disclaimer at the bottom of this blog. You may want to read that first. **

Earlier in June (2016) Chuck Robbin’s announced the Cisco Tetration Analytics Appliance. This is a major push into a much needed space: Data Center Analytics. The appliance itself is a Big Data appliance, purpose built from the ground up to provide enhanced real-time visibility into the transactions occurring in a data center. Before we get into what that actually means, let me set the stage.

IT organizations have a lot on their hands with the rapid change in applications, and the fast-paced new demands for IT service. Some of these include:

  • Disaster Recovery Planning
  • Public and hybrid cloud migration
  • Enhancing security

A first step in successfully executing on any of these is having an understanding of the applications, and their dependencies. Most IT shops don’t have any idea exactly how many applications run in their data center, much less what all of their interdependencies are. What is required is known as an Application Dependency Mapping (ADM).

Quick aside on ‘application dependency’:

Applications are not typically a single container, VM, or server. They are complex ecosystems with outside dependencies. Let’s take a simple example of an FTP server. This may be a single VM application, yet it still has various external dependencies. Think of things like: DNS, DHCP, IP Storage, AD, etc. (if you don’t know the acronyms you should still see the point.) If you were to migrate that single VM running FTP to a DR site, cloud etc. that did not have access to those dependencies, your ‘app’ would be broken.

The reason these dependency mappings are so vital for DR/cloud migration, is that you need to know what apps you have, and how they’re interdependent before you can safely move, replicate, or change them.

From a security perspective the value is also immense. First, with full visibility into what apps are running, you can make intelligent decisions on which apps you may be able to decommission. Removing unneeded apps, reduces your overall attack vector. This may be something as simple as shutting down an FTP server someone spun up for a single file move, but never spun down (much less patched, etc.) The second security advantage is that once you have visibility into everything that is working, and needs to be, you can more easily create security rules that block the rest.

Traditionally getting an Application Dependency Mapping is a painful, slow and expensive process. It can be done using internal resources, documenting the applications manually. More traditionally it’s done by a 3rd party consultant using both manual and tooled processes. Aside from cost and difficulty, the real problem with these traditional services is that they’re so slow and static, the results become useless. If it takes six months to document your applications in a static form, the end result has little-to-no value, because the data center changes so rapidly.

The original Tetration project was set out to solve this problem first, automatically and in real-time. I’ll discuss both what this means, and the enhanced use-cases it provides because of the method Tetration uses.

Data Collection: Pervasive Sensors

First, let’s discuss where we collect data to begin the process of ADM. Tetration uses several:

Network Sensors:

One place with great visibility is the network itself. Each and every device (server, VM, container) hereafter known as ‘end-point’ must be connected through the network. That means the network sees everything. When looking to collect application data in the past, the best tool available would be Netflow. Netflow was designed as a troubleshooting tool, and provides great visibility into ‘flow header’ info from the switch. While Netfflow is quite useful, it has limitations when utilized for things like security or analytics.

The first limitation is collection rates. Netflow is not a hardware (ASIC level) implementation. That means that in order to do its job on a switch, it must push processing to the switch CPU. To quickly simplify this, it means that Netflow requires heavy lifting, and data center switches can’t handle full-rate Netflow. Because of this, Netflow is set to sample data. For troubleshooting, this is no problem, but when trying to build an ADM, you’d ideally look at everything, not a sampling.

The next problem with Netflow for our purposes is things it doesn’t look at. Remember, Netflow was really designed to assist in ‘after-action’, troubleshooting problems. Because of that intent, Netflow was never designed to collect things like time-stamps from the switch. When looking at ADM and application-performance time-stamping becomes very important, so having that, and other detailed information Netflow can’t provide became very relevant.

Knowing this, the team chose not to rely on Netflow for our purposes. They needed something more robust, specifically when speaking about the data center space. Instead they designed a next generation of switching Silicon technology that can provide what I lovingly call ‘Netflow on steroids.’ This is ‘line-rate’ data about all of the transactions traversing a switch, along with things like time-stamping and more.

That becomes our network ‘sensor.’ using those sensors gives us an amazing view, but it’s not everything. What those sensors are really doing is not ADM, they’re simply telling us ‘who’s talking to whom, about what.’ For network engineers this is to/from IP combined with TCP/UDP port plus some other info. Think of this as a connectivity mapping. To make this into an application mapping more data was needed.

As of the time of this writing, these ‘network sensors’ are built into the hardware of the Cisco Nexus 9200, and 9300 EX series switches.

Server Sensors:

To really understand an application, you want to be close to the app itself. This means sitting in the operating system. The team needed data that is just simply not seen by the switch port. Therefore they had to build an agent that could reside within the OS, and provide additional information not seen by the switch. These are things like: service name, who ran the service, what privileges was the service run with, etc. These server sensors provide an additional layer of application information. The server agents are built highly secured, and very low-overhead.

Additional Data:

So far we’ve got a connectivity map, that can be compared against service/daemon name, and user/privileges. That’s still not quite enough. We don’t think of applications as the ‘service name’ in the OS. We think of applications like ‘expense system’, ‘’, etc. To be able to turn the sensor data into real application mappings the team needed to cross-reference additional information. They built integration for systems like AD, DNS, DHCP, and existing CMDB systems to get this information. This allows the connectivity map and OS data to be cross-referenced back to the business level application descriptions.

Which sensors do I need?:

Obviously, the more pervasively deployed your sensors are, the better the data available. That being said, neither sensor has to be everywhere. Let’s go through three scenarios:


Ideally you would use a combination of both network and server agents. This means you would have the OS agent in any supported operating system (VM or physical.) You would also be using supported hardware for network sensors. Every switch in the data center doesn’t need this capability to be in an ‘ideal mode’. As long as every flow/transaction is seen once, you are in an ideal mode. This means that you could rely solely on leaf or access switches with this capability.

Server only mode:

This mode relies solely on agents in the servers. This is the mode most of Tetration’s early field trial customers ran in. This mode can be used as you transition to the ideal mode over the course of network refreshes, or can be a permanent solution.

Network only mode:

In instances where there is no desire to run a server agent, Tetration can still be used. In this operational mode the system relies solely on data from switches with the built in Tetration capability.

Note: The less pervasive you sensor network, the more manual input or data manipulation required. The goal should always be to move towards the ideal mode described here over time.

So that sounds like a sh** load of data:

The next step is solving the big data problem all of these sensors created. This is a lot of data, coming in very fast, and it has to be turned over as usable very, very quickly. If the data has to sit while it’s being processed it will become stale and useless. Tetration needed to be able to ingest this data at line-rate and process it in real-time.

To solve this the engineers built a very specialized big data appliance. The appliance runs on ‘bleeding edge IT buzzword soup’: Hadoop, Spark, Kafka, Zookeeper, etc. etc. It also contains quite a bit of custom software developed internally at Cisco for this specific task. On top of this underlying analytics engine, there is an easy to use interface that doesn’t require a degree in data science.

The Tetration appliance isn’t intended to be a generalist big data system, where you can through any dataset at it, and ask any question you want. Instead it’s fine-tuned for data center analytics. The major advantage here is that you don’t need a handful of big data experts and data scientists to use the appliance.

Now what does this number crunching monster do?:

The appliance provides the following five supported use-cases. I’ll go into some detail on each.

  • Application Dependency Mapping (ADM)
  • Automated white-list creation
  • Auditing and compliance tools
  • Policy impact simulation
  • Historical forensics

That’s a lot, and some of it won’t be familiar, so let’s get into each.


First and foremost the Tetration Appliance provides an ADM. This baseline mapping of your applications, and their dependencies is core to everything else Tetration does. This ADM on it’s own is extremely useful, as mentioned in the opening of this blog above. Once you have visibility into your apps, you can start building that DR site, migrating apps, or parts of apps to the cloud, and assessing which apps may be prime for decommission.

Automated white-list creation:

If you’re looking to implement a ‘micro-segmentation’ strategy their are several products like Cisco’s Application Centric Infrastructure that can do that for you. These are enforcement tools that can implement tighter, tighter security segmentation down to the server NIC, or vNIC. The problem is figuring out what rules to put into these micro-segmentation tools. Prior to Tetration, nobody had a good answer to this. The issue is that without a current ADM, it’s hard to figure out what you can block, because you don’t know what you need open. In steps Tetration.

Once Tetration builds the initial ADM, you have the option to automatically generate a white-list. Think of the ADM as the original (what needs to talk), the white-list is the opposite, or negative of this. Since Tetration knows everything that needs to be open for your production apps, it can convert this to a list of things to block using your existing tools, or new-fangled fancy micro-segmentation tools.

Auditing and Compliance:

Auditing and compliance regulations are always a time, money, and frustration challenge, but they’re both necessary and required. There are two issues with traditional audit methodologies that Tetration helps with. Auditing is typically done by pulling configurations from multiple devices (network, security, etc.) and then verifying the security rules in those devices meets the compliance requirements.

The two ways Tetration helps are; centralization (single source of truth), and real-time accuracy. Because Tetration is viewing all transactions on the network, it can be the tool you audit against. This alleviates the need to pull information from multiple different devices in the data center. This stream-lines the audit process significantly, from both a collection and correlation perspective.

What I find the more interesting aspect is that using Tetration as the auditing tool lets you audit reality, rather than theory. Let me explain, when you do a traditional audit, you’re looking at the configuration of rules in security devices, and making the assumption that those rules and devices are doing there job, and nobodies gotten around them. On the other hand, when you do your audit using Tetration your auditing against the real-time traffic flows in your data center, ‘the reality.’

Policy impact simulation:

One of the things the Tetration appliance is doing as it collects data is extremely relevant to the following two use-cases. As the appliance ingests data, it may receive multiple copies of the same transaction. Think server A talking to server B across switch Z, all three reporting that transaction. As this occurs, the appliance de-duplicates this data and stores one master copy of every transaction down to the cluster file system. This means, the appliance keeps a historical record of every transaction in your data center. Don’t start worrying about space yet, remember this is all metadata (data about the data) not payload information, so it’s very lightweight.

The first thing the appliance can do with that historical record is policy simulation. Picture yourself as the security team wanting to implement a new security rule. You always have one challenge, the possibility of an adverse effect of a rule you implement. How do you ensure you won’t break something in production if you don’t have full visibility into real-time traffic? The answer is, you don’t.

With Tetration, you do. Tetration’s policy impact simulation allows you to model a security change (FW rule, ACL, etc.) and then have the system do an impact analysis. The system assesses your proposed change against the historical transaction records and let’s you know the real-world implications of making that change. I call this a ‘parachute’ for new security policies. Rather than waiting for a change window, hoping the rule works, and rolling it back if it breaks something, they can simply test against the real-time traffic.

Historical Forensics:

As stated above, Tetration maintains a de-duplicated copy of every transaction that occurs in the data center. On top of that unique source of truth, they’ve built both an advanced, granular search capability and a data-center ‘DVR’. What this means is that you can go back and search anything that happened in the data center post Tetration installation, or even go and playback all transaction records for a given period of time. This is an extremely powerful tool in the area of security forensics.


Tetration is a very unique product, with a wide range of features. It’s a purpose-built data center analytics appliance providing visibility, and granular control not formerly possible. If you have more time than sense, feel free to learn more by watching these white board videos I’ve done on Tetration:


App Migration:


Real-time Analytics:

** Disclaimer **

I work for Cisco, with direct ties to this product, therefore you may feel free to consider this post biased and useless.

This post is not endorsed by Cisco, nor a representation of anyone’s thoughts or views other than my own.

** Disclaimer **

GD Star Rating

The Data Center Network OS – Cisco Open NXOS

The Insieme team (INSBU) at Cisco has been working hard for three years bringing some major advances to Cisco’s Nexus portfolio. The two key platforms we’ve developed are Cisco Application Centric Infrastructure (ACI) and the Nexus 9000 data center switching platform. One of the biggest projects and innovations we’ve focused on is the operating system (OS) itself on the Nexus 9000 and 3000 platforms (more platforms to follow.) This OS is known as Cisco open NXOS. The focus here is on open programmability of network functionality on a device-by-device or system level basis.

Lots of features and supported tools have been baked into the OS to produce the industry’s leading OS for network automation and programmability. These tools drive faster time-to-market for new applications and services, better integration with automation and orchestration systems, and lowered Operating Expenses (OpEx). Some of the key features baked into the OS are:

  • DevOps tooling for automation and orchestration
  • Open 3rd party application integrations
  • Open object-based APIs, python scripting, Linux native tool access, and access to the underlying BRCM shell.

The overall goal is customer choice. Networking and Software Defined Networking (SDN) is not a one size fits all technology. The needs of cloud providers, carrier networks, enterprise, commercial, financials, etc. are all different. With that in mind we built in the most popular existing and emerging tools to provide customers the ability to use the tool(s) that best support their operational model. This, while being built on the foundation for Cisco ACI, the industries lead SDN solution.


Rather than bore you with my babble I’ll let the experts show you what we’ve developed. The video below is a Demo Friday with SDx Central. Two of my colleagues: Technical marketing Engineer Nicolas Delecroix, and Product Manager Shane Corbin put this content and demo together for your viewing pleasure.


For another look and some more info see the video the folks at Cisco TechWise TV put together here:

GD Star Rating

Application Centric Infrastructure – The Platform

** I earn a paycheck as a Principle Engineer for the Cisco business unit responsible for ACI, Nexus 9000, and Nexus 3000. This means you may want to simply consider anything I have to say on the subject biased and or useless. Your call. **

I recently listened through a 30 minute SDN analysis from Greg Ferro who considers himself to be a networking industry expert. In it he goes through his opinions of several SDN approaches and provides several guesses as to where things will be. One of the things that struck me during the recording was that he describes Cisco’s Application Centric Infrastructure (ACI) as a platform for lock-in. While he’s right on the platform part, he’s missing the mark on the lock-in. This is not an attack on Greg, if someone who considers themselves to be an in-the-know expert on the subject still believes this, maybe we haven’t gotten the message out in the right way. Let’s try and rectify that here.

What is ACI:

This is a little of a tough question, because ACI really is an application connectivity platform. ACI can be:

  • An automated network fabric, that alleviates admins from dealing with route, VLAN, port, firmware, etc. provisioning on a daily basis.
  • A security appliance that acts as a line-rate stateless firewall while automating virtual and physical service appliances from any vendor to provide advanced L4-7 services.
  • A policy automation engine. This is where we will focus in this blog.

ACI – A Policy Automation Engine:

Let’s start with a quick definition of Policy for these purposes: Policy is the requirements of data as it traverses the network. Policy can be thought of with two mindsets. Some examples of this are below, they are in no particular order, and not intended to match by column.

Business/Application Mindset

Infrastructure/Operations Mindset

Security VLAN
Compliance Subnet
Governance Firewall Rule
Risk Load-Balancing
Geo-dependancy ACL
Application Tiers Port-config
User Experience Redundancy
The application of policy on the network has two major problems:
  1. No direct translation between business requirements and infrastructure capabilities/configuration.
  2. Today it’s manual configuration on disparate devices, typically via a CLI.

As a policy automation engine ACI looks to alleviate that by mapping application level language on policy, like the left column, to automated infrastructure provisioning using the constructs in the right column. In this usage ACI can be deployed in a much different fashion than the network fabric it’s more traditionally thought of as.

The lack of understanding around this model for ACI revolves around three misconceptions or misunderstandings:

  • ACI requires all Cisco Nexus 9000 hardware.
  • ACI cannot integrate with other switching products.
  • ACI’s integration with L4-7 service devices requires management of those devices.

With that in mind, let’s take a look at deploying ACI into an existing data center switching infrastructure as a policy automation appliance.


ACI Requires all Nexus 9000 Hardware:

ACI does not require all devices to be Nexus 9000 based. ACI is based on a combination of hardware and software. This is a balanced approach of what is done best where. Software control systems provide agility, while hardware can provide acceleration for that software along with performance, scale, etc. Because of this ACI has a small set of controller and switching components. This hardware does not need to replace any other hardware, and can simply be added with a natural data center expansion, then integrated into the existing network. In fact, this is the way the majority of ACI customers are deploying ACI today.

Here is a breakdown of the ACI requirements for a production environment:

  • A controller cluster consisting of 3 Application Policy Infrastructure Controller (APIC) appliances.
  • 2x Spine switches which can be Nexus 9336 36x 40G ACI spine switches.
  • 2x Leaf switches which can be Nexus 9372 switches which have 48x 10G and 6x 40G ports.

That is the complete initial and final requirement to simply use ACI as a policy automation engine. From there, policy on any network equipment can be automated. The system can then optionally scale with more Nexus 9000, or other switching solutions can be used for connectivity.

Integrating ACI with other switching products:

First, we need to understand where policy is enforced on a traditional network. For reference take a look at the graphic below.

imageOn a traditional data center network we group connected objects by using Layer 2 VLANs, providing us approximately 4000 groups or segments depending on implementation. Within these VLAN groups, most, if not all, communication is default allowed (no policy is enforced). We then attach a single subnet, Layer 3 object, to the VLAN. When traffic moves between VLANs it is also moving between subnets which requires a routed point. This default gateway is where policy is enforced. Many device can act as the default gateway, but they are the most typical policy enforcement point.

The way traditional networks handle this policy enforcement point is at the Aggregation Layer of a physical or logical 3-tier network design. The diagram below depicts this.

imageDepending on design, the L3 boundary (default gateway) may be a router, or a L4-7 appliance such as a firewall. This is the policy enforcement point within traditional network designs. For traffic to move between groupings/segments (VLANs) it must traverse this boundary where policy is enforced. It is exactly this point where ACI is inserted as a Policy Automation Engine, integrated with the existing infrastructure. The diagram below shows one example of this.


In the diagram you’re attaching the minimal production ACI requirements: Controller cluster, 2x spine switches, and 2x leaf switches to the existing network at the aggregation layer (green links depict new links.) From there the only requirement to utilize ACI as the policy automation engine for the entire network is to trunk the VLANs to the ACI leaf, and move the default gateway (the policy enforcement point as shown above.) ACI can now automate policy for the network as a whole.

This can be used as a migration strategy, or a permanent solution. There is no requirement to migrate all switches to Nexus 9000 or even Cisco switches up front or over time. A customer can easily maintain a dual-vendor strategy, etc. while utilizing ACI. Many benefits can be found by implementing with Nexus 9000 as needed, but that is always a decision based on the pros and cons seen by a given organization.

The diagram above shows a logical depiction of how ACI would be added to an existing network while automating policy enforcement for the entire network. The diagram below shows the same thing in a different fashion. The diagram below is no different, except that it will help visualize ACI as a service appliance automating policy. The only additional changes are that the firewalls have been re-cabled to ACI leaf switches for traffic optimization, and a second pair of ACI leaf switches has been added for visual symmetry.


In this model ACI is not ‘managing’ the existing switching, it is automating policy network wide. Policy is the most frequent change point, if not the only change point, therefore it is the point that requires automation. The existing infrastructure is in place, configured and working, there is no need to begin managing it in a different fashion, that’s not where agility comes from.


ACI’s Integration with L4-7 service devices requires management of those devices:

This is one of the more interesting points of the way ACI integrates with other systems. With most solutions when you add a control/management system it is in complete control of all devices, and needs them at a default or baseline. ACI operates on a different control model which allows it to integrate and pass instructions to a device without needing to fully manage it. What this means is that ACI can integrate with L4-7 devices already in place, configured, and in use. ACI makes no assumptions of existing configuration, and simply passes commands to be implemented on those devices for new configurations in ACI. Additionally these commands are implemented natively on the device.

What this means is that there is no lock-in at all by integrating ACI with existing devices, or adding new virtual/physical appliances and integrating them with ACI. To put this more succinctly:

  • ACI can integrate with existing L4-7 devices such as firewalls, without the need for config change or migration.
  • ACI can be removed from a network after integration with no need for config change or migration on L4-7 devices. This is because the configuration is native to the device, and the device is never ‘managed’ by ACI. The device is simply used programmatically while ACI is in place.


ACI is  a platform for the deployment of applications on the network, not a platform for lock-in. In fact it is designed as a completely open system using standards based protocols, open APIs northbound and southbound, and open 3rd party device integration. It can be used as:

  • An automated pool of network resources, similar to the way UCS does with servers.
  • A security and compliance engine.
  • A policy automation engine as described above.

There is plenty of information available on ACI, take some time to get an idea of what it can do for you.

GD Star Rating

The Power of Fully Open Programmability With Cisco ACI

** Disclaimer: I work as a Principal Engineer for Cisco focused on ACI and Nexus 9000 products. Feel free to assume I have bias if you wish. **


One of the many things that make me passionate about Cisco Application Centric Infrastructure is the fully open programmable nature of the product. Unlike competitive products that claim a programmable API and then hide the important stuff behind licensing, royalties, or other add-on software you have to buy, Cisco ACI fully exposes all system functionality including the full object model and RESTful API with XML and JSON bindings. This means that anything the system can do, you can code to. As an example of this, our own GUI and CLI that ship with the product utilize the same API that’s exposed to anyone, no additional hooks or capability.

To many people this may not mean much on the surface, because they don’t work with code. That being said, whether your at home on a CLI or a GUI, native programmability can help you with your day job, whether or not you want to dabble in writing code. Obviously you can home grow anything to automate, or simplify day-to-day tasks, but if you don’t live in a world of code and compilers you can simply leverage the community through Cisco DevNet or GitHub. Plenty of applications and use-cases are already freely available with contributions happening daily. This means you can download what you want, tweak it, or use it as is.

One specific set of tools struck my interest and prompted this blog. Michael Smith, one of Cisco’s Distinguished Engineers developed the ‘ACI Tool Kit’ ( along with the help of some other engineers. ACI provides an abstracted object model of network, security, and service functionality, this is the basis for the overall programmability of the architecture. Michael’s tool kit exposes a subset of the overall functionality of ACI in a way that’s more consumable for day-to-day use. You can think of it as an introduction, but the functionality is far greater than that. Basically it’s a fast track to getting the most common workflows rolling quickly.


I won’t belabor the description of the Tool Kit because it’s well documented via the link above. Better yet, because all of the documentation is auto-generated from the code itself, it’s always up-to-date. Instead let’s take a deeper look into some examples of applications and use-cases for them already available.

ACI End-point Tracker(

Ever wonder what’s attached to your data center network as a whole? Wonder what moves where, what attaches or detaches? Within a traditional network environment that’s a tough piece of information to gather, much less track and keep updated. Within ACI, that story changes completely. The brains behind an ACI fabric is the Application Policy Infrastructure Controller (APIC) and it’s always aware of everything attached to the fabric. This means that an ACI fabric natively has all of the information described above, the trick is getting it out and working with it. In steps ACI Endpoint Tracker.

In a nutshell ACI Endpoint Tracker subscribes to a web socket on the APIC which then pushes endpoint information to a MySQL database where it is stored and can be queried. Endpoint tracker then provides a set of tools to look at and analyze that data. You can run direct database queries or use the GUI front-end Mike developed. An example of that front-end is pictured below.

imageThis provides a quick easy interface to dig out information on which end-points are attached, when they attached, when they detached, etc. You can also search based on date/time, MAC, IP, etc for any given tenant, app, or group. This provides some pretty powerful analytics. Better yet, you can take what’s there and extend it. Have a pretty static environment and want to be alerted when new devices attach? No problem. Want to see what was connected for a specific tenant at midnight on Christmas? No problem.

The best part is your controller doesn’t take any performance hit for doing this, because it’s not doing it. The information is pushed to the MySQL database in real-time, and all of the queries done by ACI End-point Tracker queries that database server. This means the amount of retention, performance, etc. are all up to you based on disk size and compute capacity you want to dedicate. To play with the GUI front-end using some example data just hit this link: ACI Endpoint Tracker GUI Demo.

ACI Lint (

The ACI Lint tool is used for static analysis of your ACI fabric. Basically it’s a configuration analysis tool that checks against several policies to report configuration that could be problematic. This is similar to the static code analysis tool lint checker does for C. It provides a list of configuration warnings/errors that should be looked into. These could be orphaned objects, stale config. The code is also designed to be extensible for custom rules. Mike uses a compliance check as an example. ACI can utilize ‘tags’ for any object. In an environment requiring compliance you can use ‘secure’ and ‘nonsecure’ tags. ACI Lint can then do a check to ensure every object is tagged, and another check to ensure that no secure group is configured to communicate with a nonsecure group. This is just one example, the possibilities are endless. The beauty here is these aren’t mis-configuration or system errors, Lint is checking for inconsistencies in configured objects.

Cableplan Application (

Anyone who’s worked in networking long enough has had some experience with cabling problems. The Cableplan app let’s you take existing cable plans and match them against the running cable plan imported from the APIC. This is a quick and easy way to ensure that the intended cabling stays consistent, or verify Layer 1 topology before moving up the stack with troubleshooting. See if your network virtualization solution can help you troubleshoot the network stack in any similar fashion.

Snapback: Configuration Snapshot and Rollback (

Humans aren’t perfect, we all make mistakes (myself excluded of course.) Those mistakes can mean downtime when we’re talking network configuration. Because of that, this is my favorite of the tools Mike built. The configuration of an entire ACI physical and virtual network is basically text formatted as XML or JSON. Therefore it can easily be imported/exported, etc. Using this functionality and ACI’s programmability, Mike built a tool that provides capabilities to snapshot the entire network (not just the virtual overlay) and roll it back if/when needed.

Some examples Mike provides are:

  • Live snapshots of the running configuration
  • 1-time or recurring snapshots which can be initiated immediately or scheduled. I love this because I can schedule a snapshot each evening, run a diff to see changes day-to-day, and rollback to yesterday if needed.
  • Versioning and storage of versioned configs
  • Ability to view both snapshots and differences between snapshots
  • Full or partial rollback
  • A web based CLI for administration.


These are just a few examples of what is freely available online right now. More importantly this is just a regurgitation of the documentation at the link at the top of this post. Don’t take my word for it, download the code, dig into the documentation and play. Once you’ve done that, CONTRIBUTE!

ACI all the things!

GD Star Rating

Building the Right Network for your VMware NSX Deployment

**  I have and will continue to make some edits to this post based on feedback. I’ll leave the original text with strikethrough, and make modifications in blue.  First I’m copying the disclaimer to the top.  **

** Disclaimer: I work for Cisco as a Principal Engineer focused on Cisco ACI and Nexus 9000 product lines, this means you’re welcome to write all of this off as biased dribble. That being said I intend for this post to be technically accurate and support people looking to deploy these solutions. If I’ve made a mistake, let me know. I’ll correct it.  **

VMware NSX is a hot topic within the SDN conversation, and many customers are looking to deploy it, for test, or proof of concept purposes, although it’s probably not quite ready for production prime-time as indicated by the only +/- 250 paying customers VMware has claimed at this point. Those numbers for a product that is 7+ years old from the Nicira perspective and several years old as NSX after Niciria’s acquisition suggest their is still work to be done.

That being said VMware has built a message around NSX which originally focused on network, then morphed into security due to lack of interest and traction.  The VMware NSX security message is focused on a very important story: micro-segmentation. Micro-segmentation is a different way of looking at the concept of security within the data center. Traditionally we’ve focused on what’s called perimeter security, securing traffic as it enters or exits the data center to the campus or the WAN. Within this architecture very little security is provided between servers, or server groups within the data center. The graphic below shows an example of this.


The problem with this model is that little to no security is provided, or available for server to server communications. This poses an issue as modern network exploits are constantly advancing. Many exploits are designed to find a single vulnerable server within the data center, exploit that server, then utilize that server to attack other servers from within the ‘trusted zone.’ This is only one example of the need for modern security designs such as micro-segmentation. Other examples may be compliance zones, multi-tenancy designs, etc. The graphic below depicts the general concept of micro-segmentation.



As shown in the diagram above micro-segmentation applies additional security at the edge to filter and protect traffic at the edge. This provides the ability to secure traffic between physical/virtual servers or groups of servers. Edge security is typically layered in as an addition to the perimeter security, and can be done with virtual appliances, physical appliances, or both. This typically means implementing stateless, or stateful inspection devices capable of securing traffic between server groups.

VMware NSX for micro-segmentation:

The key benefit of VMware NSX is its distributed firewall within the hypervisor. It is one of several products that can provide this capability.  This capability allows traffic between VMs to be inspected and enforced locally within a hypervisor. This can provide security benefits for VM traffic, as well as potential performance benefits, in the form of reduced latency for traffic that is switched locally within a single hypervisor, although that is not a common scenario. The same level of security can, of course, be provided in other ways (this is IT, it always depends.) One of those ways would be enforcing traffic switching and policy enforcement consistently at the network edge (leaf or access layer.)

When choosing whether or not to use VMware NSX you’ll want to consider a few important factors:

  • Performance – There will be significant CPU overhead in each hypervisor which is directly proportional to the amount of traffic inspection/enforcement being done on VM traffic. You’ll want to weigh this overhead against the real purpose of the hypervisor CPU: running your VMs.
  • Hypervisor(s) used – VMware NSX is actually two products:
    • NSX for VMware vSphere, which only works with the VMware hypervisor and management tools
    • NSX Multi-Hypervisor (NSX-MH), which works with a select few additional hypervisors but also adds dependencies for a VMware version of OpenStack Open Vswitch (OVS) that is not part of the OpenStack OVS community code train.
    • Each of these have disparate feature sets, so ensure you’re assessing a specific product for the features you require.
    • It has been brought to my attention that VMware NSX-MH is being discontinued and will no longer be sold. Some of the features/support it added above NSX for VMware vSphere will be incorporated into NSX over time.

Assuming now that you’ve chosen to go down the path of NSX and chosen the correct product for your needs, let’s discuss how to implement NSX in a holistic network architecture. For the purpose of brevity, we’ll focus on VMware NSX for VMware vSphere and that you’re using only VMware hypervisors which is required by that product.

NSX is now providing the ability to inspect and secure your VM traffic. In addition it can provide routing functionality for the VM data. It’s important to remember that NSX is only able to secure, route, or manage traffic that is within a hypervisor, and in this case a VMware hypervisor.  This means that the rest of your physical servers, appliances, mainframe ports, etc will need another security model, and management domain, or the traffic will need to be artificially forced into the hypervisor. The second option will cause bandwidth constraints and non-optimal traffic patterns which will result in increased latency and degraded user experience.

The other important thing to remember is that NSX is primarily designed for management and security of the VM-to-VM communication. This means that even non-VM-data ports on the hypervisor require special consideration. The following diagram depicts these separate ports.

 imageAs shown in the diagram above a single VMware host utilizes several physical port types for network connectivity. VMware NSX focuses on routing and services for VM data, which is a subset of the host networking requirements. Security, services, and connectivity will need to be applied to host management, vMotion, IP storage ports etc. Additionally physical network connectivity, port configuration and security will need to be implemented for the VM data ports.

With that consideration it will be important to assess your end-to-end network infrastructure prior to deploying NSX. Especially when working with network security you don’t want to start down a path laid out with a myopic virtual only view. One way to improve upon the gaps provided by the VMware NSX Network Functions Virtualization (NFV) tool is to utilize Cisco Application Centric Infrastructure (ACI) to provide robust, automated, secure network transport. As NSX is simply an application for virtual environments ACI is able to handle it as it would any other application. Additionally ACI is built with end-to-end micro-segmentation in place as a native feature of the platform.

Using VMware NSX with Cisco ACI for a holistic security solution:

This section will assume preexisting knowledge of the basic ACI concepts. If you’re not familiar the following short videos will bring you up to speed:


Cisco ACI takes an application requirements approach to automating network deployment. Put simply, ACI automates network provisioning based on the applications requirements. For example, rather than translating compliance requirements for an application into VLANs, subnets, firewall rules etc. you simply place PCI compliant end-points (servers, ports, etc.) in a group dictating the access rules for that group.

ACI groups are called End-Point Groups and can be used to group any objects that require similar connectivity and policy as described in the video above ‘Application Centric Infrastructure | End-Point Groups’. This grouping can be used to enhance NSX security and automation by providing for physical links, management/vMotion/IP Storage ports, etc. The diagram below shows an example of this configuration.

image As shown in the diagram above ACI will provide segmentation beyond what’s provided by NSX for the VM traffic. This allows the NSX VM data ports to be placed into groups as required, while grouping and segmenting the ports not managed by NSX, as well as physical server and appliance ports. This allows for a complete security and automation stance beyond the limitations of VMware NSX.

ACI additionally provides line-rate gateway functionality from any port, to any port. NSX relies on encapsulation techniques such as VxLAN to traverse the underlying IP network. In order for this encapsulated VM traffic to communicate with other devices such as the 30+ percent of workloads that aren’t virtualized using traditional VM techniques. This reduces the x86 requirements of VMware NSX and removes performance bottlenecks caused by NSX gateway servers. The following diagram depicts a 3-tier web application utilizing ACI to provide traffic normalization for both virtualized and non-virtualized servers.


The diagram shows the standard ports used by the VMware hosts as well as two EPGs used to group NSX Web and App tier VMs.  The database tier is shown running on physical servers which create the complete web application using both virtual and physical resources. Typically with this type of design in NSX gateway servers would be required for encapsulation/de-encapsulation between the virtual and physical environment.

Within ACI, overlay encapsulation is normalized between untagged, 802.1q, and VxLAN with NVGRE support coming shortly. This means that gateway devices are not needed because the ACI ports handle translation at line-rate. The following diagram shows the encapsulation and traffic flow for this example 3-tier web application.


The diagram above depicts the traffic encapsulation translation that will be done natively by Cisco ACI. User traffic will be 802.1q VLAN tagged or untagged, and need to be sent to the Web Tier via the correct VxLAN tunnel. From there the VxLAN to VxLAN forwarding between the virtualized Web and App tier can be handled by NSX or ACI. Lastly the VxLAN encapsulation will be translated by ACI back to VLAN or untagged traffic and sent to the Database Tier. This will all be handled bi-directionally, significantly reducing latency and overhead.


When NSX is chosen for virtual network automation or security there are several factors that must be considered. The most significant of these is handling traffic that NSX does not support, such as: physical servers/appliances, unsupported hypervisors, non-hypervisor based containers, etc. In order to provide end-to-end security and automation, the physical transport network will be the best point as it handles all traffic within the data center. Cisco ACI provides a robust automated and secured network that greatly enhances the NFV functionality provided by NSX for virtual machine traffic.

** Disclaimer: I work for Cisco as a Principal Engineer focused on Cisco ACI and Nexus 9000 product lines, this means you’re welcome to write all of this off as biased dribble. **

For more information on Cisco ACI:


Micro-segmentation: Data Center Microsegmentation- Enhance Security for Data Center Traffic

GD Star Rating

VMunderground Opening Acts SDN Panel

I had the opportunity to present on a great SDN  panel of networking and virtualization experts during VMunderground Opening Acts.  The panel represents 3 different vendors, one independent networking expert, and one Value Added Reseller (VAR.)

The Panel from left to right:

  • Chris Wahl (@chriswahl) -  Moderator
  • Myself (@jonisick) (My casual attire and early departure were to sprint over to V0dgeball in support of The Wounded Warrior Project.)
  • Lisa Caywood (@reallisac)
  • Tom Hollingsworth (@networkingnerd)
  • Ryan Hughes (@angryjesters)
  • Scott Lowe (@scott_lowe)


GD Star Rating

A Lesson on Infrastructure from Nigeria – SDN and Networking

I recently took an amazing trip focused on launching Cisco Application Centric Infrastructure (ACI) across Africa (I work as a Technical Marketing Engineer for the Cisco BU responsible for ACI.)  During the trip I learned as much information as I was there to share.  One of the more interesting lessons I learned was about the importance of infrastructure, and the parallels that can be drawn to networking.  Lagos Nigeria was the inspiration for this lesson.  Before beginning, let me state for the record that I enjoyed my trip, the people I had the pleasure to work with, and the parts of the culture I was able to experience.  This is simply an observation of the infrastructure and its parallels to data center networks.

Nigeria is known as the ‘Giant of Africa’ because of its population and economy.  It has explosive birth rates which have brought it quickly to 174 million inhabitants, and its GDP has become the largest in Africa at $500 billion.  This GDP is primarily oil based (40%) and surpasses South Africa with its mining, banking, trade, and agricultural industries.  Nigeria also has a large and quickly growing telecommunications sector, and a highly developed financial services sector.  Even more important is that Nigeria is expected to be one of the world’s top 20 economies by 2050.  (Source:

With this and several other industries and natural resources, Nigeria has the potential to quickly become a very dominant player in the global stage.  The issue the country faces is that all of this industry is dependant on one thing: infrastructure.  Government, transportation, electrical, telecommunications, water, security, etc. infrastructure is required to deliver on the value of these industries.  The Nigerian infrastructure is abysmal.

Corruption is rampant at all stages, instantly apparent before even passing through immigration at the airport.  Once outside of the airport if you travel the roads, especially at night you can expect to be stopped at roadside checkpoints and asked for a ‘gift’ from armed military or police forces.  This is definitely not a problem unique to Nigeria, but having travelled to many similar places I found it to be much more in your face, and ingrained in the overall system.

Those same roads that require gifts to travel on are commonly hard-packed dirt, or deteriorating pavement.  Large pot holes filled with water scatter the roadways making travel difficult.  Intersections are typically unmarked and free of traffic signals, stop, or yield signs.  Traffic chokes the streets and even short trips can take hours depending on traffic that is unpredictable.


The electrical grid is fragile and unstable with brownouts frequent throughout the day.  In some areas power is on for a day or two at a time, followed by days of darkness.  In the nicer complexes generators are used to fill the gaps.  The hotel we stayed at was a very nice global chain, and the power went out briefly several times a day for a few moments while the generator kicked back in.

The overall security infrastructure of Nigeria has issues of its own.  Because of the weaknesses in central security most any business establishment you enter will have its own security.  This means you’ll go through metal detectors, x-rays, pat-downs, car searches, etc before entering most places.

Additionally you may be required to hire private security while in country, depending on your employer.  Private security is always a catch-22, to be secure you hire security, by having security you become a more prominent target.  As a base example of this, one can assume that someone who can afford private security guards must be important enough, to someone, for a ransom. 

All of these aspects pose significant challenges to doing business in Nigeria.  The roads and security issues mean that you’ll spend far more time than necessary getting between meetings.  You’ll have the unpredictable travel times, the added time of going through security at each end, parking challenges, etc.  Along the way you may experience check-points that demand gifts, etc.  The power may pose a problem depending on the generator capabilities of the locations your visiting.

All of these issues choke the profitability of doing business in countries like this.  They also make doing business in these countries more difficult.  Some simple examples of this would be companies that simply choose not to send staff due to security reasons, or individual employees who are not comfortable travelling to these types of locations.  It’s far easier to find someone who’s willing to travel the expanse of the European Union with its solid infrastructure, relative safety,etc. than it may be to find people willing to travel to such locations.

All of this quickly drew a parallel in my mind to the current change going on within data center networks, specifically Software Defined Networking (SDN.)  SDN has the potential to drive new revenue streams in commercial business, and more quickly/efficiently accomplish the mission at hand for non-commercial organizations.  That being said, SDN will always be limited by the infrastructure that supports it.

A lot of talk around SDN focuses on software solutions that ride on top of existing networking equipment in order to provide features x, y and z.  Very little talk is given to the networking equipment below.  This will quickly become an issue for organizations looking to improve the application and service delivery of their data center.  Like Nigeria, these business services will be hindered by the infrastructure that supports them.

Looking at today’s networks, many are not far off from the roads pictured above.  We have 20+ years of quick fixes, protocol band-aids, and duct tape layered on, to fix point problems before moving on to the next.  The physical transport of the network has become extremely complex.

Beyond these issues there are new physical requirements for today’s data center traffic.  1 Gig server links are saturated and quickly transitioning to 10 Gig.  10 Gig adoption at the access layer is driving demand for higher speeds at the aggregation and core layers, including 40 Gig and above.  Simple speeds and feeds increases cannot be solved by software alone.  A congested network with additional overlay headers, will simply become a more congested network.

A more systemic problem is the network designs themselves.  The most prominent network design in the data center is the 3-tier design.  This design consists of some combination of logical and physical Access, Aggregation and Core tiers.  In some cases one or more tiers are collapsed, based on size and scale, but the logical topology remains the same.  These designs are based on traditional North/South traffic patterns. With these traffic patterns, data is primarily coming into the data center through the core (north) and being sent south to the server for processing, then back out.  Today, the majority of the data center traffic travels East/West between servers.  This can be multi-tier applications, distributed applications, etc.  The change in traffic pattern puts constraints on the traditional designs.

The first constraint is the traffic flow itself.  As shown in the diagram below, traffic is typically sent to the aggregation tier for policy enforcement (security, user experience, etc.)  This pattern causes a ping-pong effect for traffic moving between server ports.

imageEqually as important is the design of the hardware in place today.  Networking hardware in the data center is typically oversubscribed to reduce cost.  This means that while a switch may offer 48x 10 Gig ports, its hardware design may only offer a portion of that total bandwidth.  This is done with two assumptions:

1) the traffic will eventually be egressing the data center network on slower WAN links

2) not all ports will be attempting to send packets at full-rate at the same time.

With the way in which modern applications are being built and used, this is no longer the case.  Due to the distribution of applications we more often have 1 Gig or 10 Gig server ports communicating with other 1 Gig or 10 Gig ports.  Additionally many applications will actually attempt to push all ports at line-rate at one time.  Big data applications are a common example of this.

The new traffic demands in the data center require new hardware designs, and new network topologies.  Most modern network hardware solutions are designed for full-rate non-blocking traffic, or as close to it as possible.  Additionally the designs being recommended by most vendors today are flatter two tier architectures known as Spine/Leaf or Clos architectures.  These designs lend themselves well to scalability and consistent latency betweens servers, service appliances (virtual/physical) and WAN or data center interconnect links.

Like Nigeria, our business solutions will only be as effective as the infrastructure that supports them.  We can of course, move forward and grow at some rate, for some time, by layering over top of the existing but we’ll be limited.  At some point in our futures we’ll need to overhaul the infrastructure itself to support the full potential of the services that ride on top of it.

GD Star Rating

Next Generation Networking Panel With Brad Hedlund – Cisco ACI vs. VMware NSX

GD Star Rating

Seeing the Big Picture – Job Rotation

A mini-rant I went on this evening prompted Jason Edelman (@jedelman) to suggest I write a blog on the topic.  My rant was in regards to job rotation.  Specifically in the IT vendor world rotating from the field (sales-engineering, professional services, technical support) to the business units building the products, and vice versa.  This is all about perspective.

In the past I’ve written about pre-sales engineering:

The Art of Pre-Sales:

The Art of Pre-Sales Part II – Showing Value:

Pre-sales engineering (at Value Added Resellers) has been my background for quite a while.  About a year and a half ago I moved over to the vendor side, and specifically a business unit brining brand new products to market.  This has been an eye opening experience to say the least.

What I’ve Learned:

  1. Building a product, and bringing it to market is completely different from using it to architect solutions and selling it.
    • This one almost goes without saying.  When your selling a product/architecting solutions you are focused on using a solution to solve a problem.  Both of these should be known’s. 
    • If you’re a good architect/engineer you’re using the two ears and one mouth your given in proportion to identify the problem.  Then you select the best fit solution for that problem from your bag of tricks.
    • When you’re building a product you’re trying to identify a broad problem, market shift, or industry gap and create a solution for that.  The technology is only one small piece of this.  The other focuses include:
      • The Total Addressable Market (TAM)
      • Business impact/disruption to current products
      • Marketing
      • Training/education
      • Adoption
      • Sales ramp
  2. A lot of those “WTF were they thinking” questions, have valid answers.
    • Ever sat back and asked yourself ‘What were they thinking?’  9 times out of 10, they were. 
      • 9 out of 10 of those times they were thinking TAM. 
        • We tend to work in specific verticals (as customers or vendors): health care, government, service-provider, etc.  What seems like a big requirement in our view, may not be significant in the big picture being addressed.
        • Shifting markets and therefore shifting TAM.  Where you may be selling a lot of feature x today, that may be a shrinking market/requirement in the big picture.
      • Complexity/down the road costs.  In many cases implementing feature x while complicate feature A, B, C, and D.  It may complicate Q&A processes, slow feature adoption, add cost, etc.
  3. Everything has trade-offs.
    • Nothing is free, engineering resources are limited, time is limited, budget is limited.  This means that tough decisions have to be made.  Tough decisions are made in every roadmap meeting and most of those end with features on the chopping block, or pushed out.
    • Anything added typically means something removed or delayed.  In cases where it doesn’t, it probably means a date will slip.

The Flip Side:

This is not a one way street.  The flip side is true as well.  My lifetime of experience on the other side gives me a far different perspective than many of my colleagues.  Some of my colleagues have always lived in the ‘ivory tower’ I now live with them in.  Without having experience in the field it’s hard to empathize with specific requests, needs, complaints or concerns.  It’s hard to really be in touch with the day to day requirements and problems.  Having the other perspective is beneficial all around.

So What?

  • If you’re an IT vendor find ways to open up job rotation practices.  3-6 month rotations every 2-3 years would be ideal, but anything is a start.  Advertise the options, encourage managers to support it, promote it.
  • If you’re an individual, suggest the program.  Beyond suggesting the program search out opportunities.  It’s always beneficial to have a broader understanding of the company you work for, trying different roles will help with this. 
  • Even if neither of these things can happen, find ways to engage often with your counterparts on the other side of the fence and listen.  The more you understand their point of view the easier it will be to find win/win solutions.
GD Star Rating