I spent the better part of the last decade at different startups and web companies, but one of my recent consulting gigs led me to a Fortune 500 company. I’ve done work at large enterprises before, but I really did forget what it’s like and it amounted to a rather jarring experience. I’ve entered a deep and dark world of enterprise architecture, frameworks, meaningless acronyms and a cesspool of “enterprise” software where it seems to breed and reproduce uncontrollably. It’s a place with abstraction at every layer, except anywhere that’s relevant.
Sometimes I got a sense that I was warped in time at least 10 years back and that everyone around me was moving at different speed. To paraphrase a famous quote: “It’s not that they are lazy, it’s just that they don’t care”.
I do have to mention some caveats. These are purely observations on IT/Ops and I had barely any idea what was happening on the dev side (which is a problem in itself). I also didn’t have visibility into every part of the organization, so perhaps everything is wonderful in other areas, though I have my doubts.
That seems like it’s the favorite pastime in the company. Tedious, mind numbing and soul sucking meetings. Granted, this isn’t new or unique to these places, but the quantity and utter lack of effectiveness is still surprising. A trivial problem requires at least 5–7 people to participate in the meeting for an hour, even though the solution is obvious and the issue can be resolved in <10 minutes. That is followed by another meeting where people are trying to decide whether to setup yet another meeting to update management on the results of the previous meeting. Things like group chats or internal IRC or Basecamp don’t exist.
Yeah, right. It’s an absolutely foreign concept. Here is an example: I’ve had a misfortune to review a supporting system for an application which was designed with one goal in mind: to let Dev bypass IT. It cost a boatload of money, created a ton of overhead and a myriad of new dependencies. It could have been solved by a sysadmin talking to a developer and suggesting how to use an existing system in one of a 100 different ways. It would have given Dev exactly the freedom they needed without compromising anything else. Though I did get a sense that anyone attempting this would’ve been ostracized for betrayal. There are dozens of cases like this and not only coming from Dev.
- Nearly all business units dread dealing with IT. That’s a clear sign you’re not meeting the company’s needs.
- No communication with development. There is a wall with barbed wire and plenty of mutual hate to go around between IT and Dev. Naturally, Dev tries to circumvent IT whenever they can, and the result is sheer absurdity
- Bad communication within IT itself. Each group (network, storage, systems, etc) effectively lives on an island. Talking shouldn’t only be done via managers, PMs and “liaisons”.
- IT is constantly on the defensive. The prevalent attitude is that it’s “our” datacenter and we must hold the line at all costs to protect it.
- Automation, measurement (metrics) and standardization aren’t considered important.
For a company that doesn’t sell software to consumers continuous delivery may not be a priority and may naturally take a back seat to stability. However, focus on automation needs to happen regardless of whether there are 50 deploys a day or 1 per month, because it will deliver better business outcomes.
I wouldn’t expect an enormous company to fully buy into a nascent (in enterprise timelines) movement, but devops is an answer to a specific set of problems and it was practiced at a lot of companies before there was a name for it. It’s an open question whether it can really scale to a company this size and if all the silos can be broken down with that many people. You still need people who specialize and I wouldn’t expect a DBA to be an expert in the intricacies of BGP routing. But you must be able to see across the stack and have understanding and visibility into how other areas impact the domain of your expertise. Even if you won’t get there 100%, at least an attempt should be made to fix the obvious problems and it has to start with a culture change first.
This is pretty standard approach at Fortune 500. Everything has to be an “enterprise” application, even when there is a better open source alternative. But this company took it to a whole another level. Part of the IT guiding principles at the company is to try to buy a solution first and foremost. Now, I’ll be the first to say that any extreme is not a good idea. The “Not Invented Here” syndrome can be just as bad at its worst. But there has to be a middle ground somewhere. When everything that you run is bought from BigSoftwareCo/BigHardwareCo, what happens is:
- You spend non-trivial amounts of time dealing with various vendors and their bugs/issues, etc.
- You live at a mercy of the vendor
- You lack ownership and internal knowledge of your systems and more importantly: you don’t build internal core expertise.
- The costs are staggering.
- Mentality of IT becomes: “call the vendor”, rather than “how do I fix it”.
Process is critical to a well-run IT organization. Approaches like ITSM, ITIL, ISO and Zachman or TOGAF frameworks can and should be used. Even without a formal framework, there should be a clear process in IT. If a good process doesn’t exist, IT is just a house built on sand. It’s what allows IT to have discipline, creates consistency, accountability, and reliability. At this particular company:
- The frameworks are applied to the process, but not the actual systems. There is a fundamental disconnect between process, measurement and technology.
- Missing the forest for the trees. The process isn’t a goal in itself. It’s only important because it enables the company to achieve something that creates business value.
- No incentive to improve. The fact that a simple task that should take less than a day, takes a week because of inefficiency, lack of automation and artificial obstacles is never discussed.
- That’s the way it’s always been done around there. A very insular culture that is very resistant to change.
I’ve sat in 2-hour long post mortem meetings, where the entire discussion was about process and which PM was at fault, without a single minute spent talking about the actual problem of a bad technical design. To be honest, I have to remember to “Never ascribe to malice that which is adequately explained by incompetence”. Even that’s probably too harsh. Most simply don’t know better and it’s more of an organizational failure then anything else.
You shouldn’t be a slave to the process if it doesn’t serve its purpose, but adjust it as necessary. However, at this company the process is enshrined in everything. It is the be all and end all in IT. It’s like a cancer which spread out throughout the company and squeezed out every bit of innovation and agility. It does buy them reliability and predictability, but everything else is sacrificed on its altar. And it reminds me of this old Dilbert comic.
Cloud is a pretty loaded word co-opted by the marketers and salespeople. By cloud I mean a level of abstraction that allows you to take a different approach to managing and architecting your systems. That doesn’t mean fork lifting their infrastructure to AWS. They have plenty of resources to deploy a private or a hybrid cloud, but it’s not even part of a discussion or long term strategy.
I brought it up in a few conversations and the general attitude seemed to be: cloud is for tiny companies, not a “serious enterprise” like us. The fact that a lot of these “tiny” companies handle more traffic and solve far more complex problems is completely lost on them. Besides, Netflix isn’t exactly “tiny” and there are many others like them. This could be a bit of Dunning-Kruger effect, since a significant majority of people in IT have been rooted there forever and never had much exposure to different technologies.
The lack of unified standards or a platform is somewhat odd. My initial expectation was to see a set of rigid and outdated standards that are difficult or impossible to change, but boy was I wrong.
- Standards exist, but they are all contained within smaller groups in IT. The network, storage, systems, security and DBA teams all have their own approaches.
- No unified architecture. The only thing that keeps everything together is the aforementioned process.
- Complete lack of configuration management, automation, standardized logging and monitoring, dynamic inventory and provisioning and auto-scaling. Forget about a standard computing framework or APIs.
- Infrastructure as a series of static, monolithic blocks. It’s almost a like a Lego game from hell, where some pieces don’t line up and others are crazy glued together
- Seemingly no ability or desire to address the underlying causes. Typically they’ll play whack-a-mole and deal with the symptoms.
If an application gets deployed, migrated or upgraded, there is a massive team that gets together to figure it out, rather than dynamically plugging into a predictable setup. Hardware rules the day and automatic scalability is out of the question. Vmware is heavily used, but everything stills boils down to hardware requirements. An immense amount of time is spent figuring out (read: guessing) memory/CPU/server counts/storage, rather than load testing and dynamically acquiring compute, storage, network and software management resources.
A part it of stems from cost management and accrual accounting but you can still do chargeback computing on platform approach. Unfortunately, the shift from a mentality of: “I need x servers with x CPU and x RAM each” looks to be years away.
To be fair, this isn’t a problem that’s exclusive to these types of companies. It happens in places of all sizes and tends to get worse with time. Knowledge management problem existed as long as IT and there isn’t a clear cut answer for it. The key is culture and commitment to quality documentation. This was probably one of the weakest areas that I’ve observed.
- There is an utter deluge of information and data which resembles an unstructured brain dump.
- No consolidation of information and much less any standardization. There is no “single source of truth”. Data exists on file shares, Sharepoint (ugh!), portals, a dozen of applications, not to mention people’s heads. Information is stored in pdfs, excel files, visios, word documents and so on.
- Absolute lack of standards as it applies to documentation. Most diagrams are cluttered with useless details that don’t convey anything relevant. Shapes have no data assigned or linked to them, just text blobs. Nothing is linked or cross-referenced or can be queried. There is no metamodeling, reference architecture, standard UML or toolset for enterprise architecture. There isn’t even a common vocabulary.
If they spent as much time working on standardizing and unifying their knowledge and enterprise architecture as they did on their process, they’d be in a much better shape.
It really ties into poor documentation. Nearly everything is static and usually outdated by the time you read it. In one project, I’ve asked for information on architecture and workflow of their standard enterprise authentication/user provisioning model and in return I got empty stares. Even something as basic as that doesn’t exist. There are half dozen components to it and apparently no one understands how they tie together.
- Systems data isn’t available in real or near real time. In cases where data is available, access is generally highly restricted.
- No aggregation and correlation of data. It exists across many applications with no easy way to correlate.
- Getting information is dependent on finding and talking to the “right” people, which is a huge time sink and yet another impediment to change. To get into the business rules, the “whys” and the workflows would take weeks if not months.
The ideal scenario would be to have a standardized information format and the ability to dynamically query for relevant data. There should be a real-time feedback loop between and into different parts of IT. If someone was interested in application XYZ, within minutes they should be able to get underlying hardware (or VMs), workflow, architecture, utilization/capacity, names, IPs, dependencies and so on. At least a wiki would be nice and ideally you’d have proper system modeling that addresses components, policies, dependencies, etc.
I ought to be somewhat careful here and not paint everyone with the same brush. There are some very technically skilled individuals and I am sure there are many others that I haven’t met. This isn’t a character judgement by any means and a vast majority of people are polite and friendly.
However, a significant percentage of “IT” people have become masters of process and internal bureaucracy. They are very skilled at navigating internal politics and invested in perpetuating the existing way of doing things. Their technical expertise is negligible and their contribution to the end result is minimal, if any. These are roles that I expected to see filled with people who have extensive and deep technical knowledge. To a large extent, it’s a problem with job classification. You shouldn’t be giving technical project managers titles containing words like “designer” and “architect”.
The personnel problem becomes a self-reinforcing mechanism. It reminds me of a quote by Peter Drucker: “A poor organization structure makes good performance impossible, no matter how good the individual managers may be.” People who have the energy, capability and desire to institute change don’t stay at companies like these for very long. That supports the mentality that they can’t build complicated solutions on their own. Capacity and desire for innovation gets whittled down until there is nothing left.
I am sure that this doesn’t surprise a lot of people. Unfortunately, this is all too commonplace in many companies. It’s not necessarily all horrible, since this operation does achieve certain goals. Reliability and predictability are on a high level and certainly risk tolerance for a Fortune 500 company is not the same as for a startup. Priorities for a company like this are and should be different. Various compliance requirements play a significant role and they are not going to throw their process out of the window overnight. I get all of that and in many ways they are more right then wrong.
But if a “project” of minimal complexity requires the participation of 5 people, 8 meetings, 15 conference calls, 50 emails with 25 attachments, a sharepoint site, a file share, 3 approvals, 7 diagrams and takes 2 months to execute, it’s not an example of quality of controls, accountability or efficiency. It’s an abject failure either with the classification (what’s a project and what’s a task) or with the efficiency of the process. It vividly demonstrates the lack of automation and standardization which leads to results like this. Even with objectives of stability, cost, and accountability taking considerable priority over release time, it still can’t be categorized as anything other than a sheer disaster. Inefficient and redundant processes are a basic IT failure, even with a 100% focus on operational stability.
IT isn’t only a cost center and a necessary evil. It can and should be a source of innovation, competitive advantage and business value. The process should enable this, not stifle it. That can’t happen at an organization like this, because as currently constructed, it simply doesn’t have the capacity and agility to achieve significant change. It doesn’t even have an avenue to release small, potentially high-impact projects. In order for these to succeed, you need to be able to iterate quickly and have a real time feedback loop with key metrics.
What is truly disheartening is that even if they had a CTO or CIO that fully bought into a different vision; trying to change this organization and its culture would be a Sisyphean task. Perhaps creating a skunkworks team that tries to build a different platform might work. However, that may result in parallel infrastructures down the road, not to mention legacy applications that would have to be accounted for. It may produce an even worse result. Maybe cross-functional teams that see a project through its life cycle would do better.
Perhaps I have too much of a utopian view of how IT should operate. Frankly, I don’t know how to fix an organization of this scale, but I do know that it’s broken. In my mind, this is a dysfunctional path for IT and a dead end. There is nothing wrong with looking at IT as a utility, but you shouldn’t run it as myopic monopoly. I am not advocating for cowboy sysadmins running amok, but I can’t help but feel bad for someone out there that probably has 100 ideas about how to make things better, but will never have an opportunity to do that. This is where the creativity and innovation of IT goes to die.