IT Forecast: Cloudy!

December 20, 2010

Cloud computing is picking up steam (and moist). I already covered the essentials of cloud computing in two separate posts (start here). Like I did mention in my earlier posts, it is not a new technology or anything innovative. Cloud computing concepts were employed in academia and research for many years. It was used mainly to make the most out of commodity computers. The enterprise started to pay attention since the start of the post-Y2K, money-saving, lean-seeking era. I attribute the reason for that speedy adoption to three main factors:

1. Increasing cost of ever-expanding data centers, and purchasing new servers and cooling them.
2. Advancement of virtualization and associated management tools (VMware, Citrix, Microsoft, etc.)
3. Amazon’s EC2 (around 2002).

During the economic slow down this past decade, businesses looked for opportunities to reduce hardware, cooling and energy, as well as data center and their maintenance costs. Hardware manufacturers poured hundreds of millions of dollars to invest in greener and more power-efficient machines. It is evident from the newer and newer chip generations that a newer generation chip by AMD or Intel means more computing power and less power consumption. That wasn’t nearly enough in reducing costs. Companies still had to deal with ever increasing data center sizes. They moved to consolidate servers, which proved to be harder than they had anticipated as legacy applications weren’t as easy to migrate to newer servers with less available resources. Advancement in virtualization technology and its associated management tools allowed IT to consolidate servers and compact applications into smaller physical machines (still using those powerful machines, but more efficiently now) as CPU utilization increased from under 15% to much higher rates. This allowed a major reduction in size by a factor of 10 on average.

Companies jumped on the virtualization bandwagon as a way to reduce cost per physical machine, and lower power consumption by those idle servers. Virtualization also helped contain rapidly ballooning data centers. But that is not enough. Companies still have to buy expensive servers, maintain them, manage the overhead (IT staff overhead as well as physical management overhead), etc. That, along with the initial major push by Amazon that got the cloud computing engine started in the enterprise, allowed businesses to take advantage of a new face of old technology. With cloud computing, companies are able to outsource major parts of their data centers to an outside cloud service provider. They save IT overhead (cloud service providers provide their own staff), software management (security patches, upgrades, updates, etc.), hardware management (allocating physical spaces, cabling the server, rack space, etc.), and hardware cost, as companies did not need to over stack their data centers in order to manage future spikes by client requests, to only go back to a normal request cycle by their clients after those spikes.

With cloud computing, companies can concentrate on their own core business without consuming their time and effort managing this overhead that does not contribute to company’s IP or bottom line. Businesses can take advantage of a hybrid cloud to outsource their scalable-hungry applications, while keeping in-house those elephant applications (that do not change much and do not require run-time scalability). With cloud computing, companies do not need to buy powerful servers. Gone are the days when servers kept getting more and more powerful. With cloud computing, commodity computers are kings. I expect many of server manufacturers to shift their resources to building either internal (private) cloud ready servers (that replace software solutions for building and maintaining cloud-ready infrastructure) or have features and properties to allow them to plug into an existing cloud servers rack (servers with much functionality stripped out to the bare-minimum to allow lightweight-like computing machine that is green and cost-effective). The reason why cloud computing mark the beginning of the end of high-end servers is because as companies move their infrastructure to the cloud, cloud service providers will realize that to stay competitive in the per-hour resource renting space, they will need to lower their per physical server cost. To be able to do that, they will need to utilize virtualization (to maximize income per physical server), and lower physical server cost and its associated power costs. To be able to lower the cost of those servers and their power-consumption, cloud service providers will use commodity computers that are cheaper and require less energy. In the world of unlimited CPU and memory resource pools, there is no need to buy this expensive 64 GB server anymore that costs a lot more than a pool of commodity computers with the same total size of memory. Furthermore, commodity computers are stripped down to the bare-minimum features that they do not require much software management or overhead resource burning. That is why Google is building their own commodity computers instead of buying them.

Businesses will continue to outsource their infrastructure, platforms and applications to the cloud as they realize that they would become more productive if they focus on their core business functions rather than all the bells and whistles that are needed to make that happen. And as businesses outsource this overhead to a company dedicated to manage this overhead for a lot more manageable cost, IT administrators will see their jobs decaying away. IT administrators will have to find other things to do outside of their normal range of functions. They will have to acquire a new set of skills, probably in the development field as they notice the shift in IT management power from their hands to the end-user. With cloud computing, the promise of simplifying IT management is stronger than ever. An average user will be able to log on to the cloud service provider and manage their own application and infrastructure using user-friendly management pages without needing to have any prior technical background.

Cloud computing is not a revolution, but its adoption will be this coming decade. As mobile devices and notebooks market grow in size, client devices are becoming thinner and thinner, while the applications are getting richer and richer. This is only possible with hosted services that are ever-scalable, available, and fail-over ready. Those are just a few of the promises that the cloud provides. Consumers will go after smaller and cheaper devices and terminals as there would be no need for powerful laptops and desktops anymore. If I can afford to buy 5 or more dumb, thin, and very small terminals and distribute them around the house, and then buy monitors and attach them to those terminals, then I can use a VDI solution, hosted on a popular cloud service provider to load my desktop (along with my session) on any of the terminals in my house! My remotely hosted applications will be running on the supercomputer-like grid of commodity computers with all the resources they need. I can create custom-made desktop VMs for my children with high levels of control. They can destroy their VM and I can get a new one from the host service in no time! No slow computers, and no dropped and broken computers. No 10 wires per machine (just one for the terminal). This is going to be the new generation of personal computing within the next few years. I may be able to hook a 17 inch LCD to my iPhone, and be able to see my VM hosted on RackSpace.com on the LCD as if it was connected to a very powerful desktop!! How about eliminating the need for an LCD and using a projector screen? Maybe my smartphone will allow for such a project, which will allow me to take a very powerful computer wherever I go without losing speed, sacrificing battery power or even giving up screen size!

No one will benefit more from cloud computing than government, small budget businesses and non-for-profit organizations. Buying small and thin terminals (perhaps the size of one’s palm, or even a finger-size terminal) and investing much of the money on the data center (private clouds) or purchasing more and more services by the public cloud, would allow for less cost. No more worrying about back up, scalability, compliance, licensing, clustering, fail-over, high availability, bandwidth, memory, replication, disaster recovery, security, software upgrades, re-location, etc. Those are all given as promises outlined in detail in a service line agreement (SLA). Even better, those institutions and businesses will be able to deliver the same consistent service across campuses and locations.

Additionally, interoperability and integration (standards are being laid out, but will hopefully solidify and become industry-wide accepted standards within a few years) will allow companies to utilize new software and applications with a switch of a few check box selections from the cloud management page for their data center. A company can switch from using SQL Server 2008 to MySQL Enterprise with a check box selection. Users can switch their email clients from one site to another, etc. Even beyond that, a consumer can switch a whole platform from Windows 7 to Linux Ubuntu and back in a few seconds. Platforms and applications become tools and roads rather than destinations. This is only good for consumers because the ease of transitioning in and out of platforms and applications will allow for opportunity as well as caution and fear of losing customers for all businesses alike. This will result in a booming period for open source software (difficult installation and set up processes kept most open source software from the public hand), as management becomes transparent and standard.

The next decade will allow for some exciting opportunities to unfold as businesses start sprinting in a fast-pace race, after going a long decade of dieting (they became leaner) and adopting new technologies that will allow them to concentrate on their core business rather than all the extra fat (overhead).

I will write another post that will be dedicated to talk about some of the available cloud services and applications that people and small business can use immediately, in order to manage their start up or maintenance cost, without falling behind to competition that uses better software and services.

It is hard to forecast what will come next, but one thing for sure, it will definitely be cloudy!

Advertisements

Cloud Computing Simplified (Part II)

December 14, 2010

This is a continuation of my last post (Cloud Computing Simplified (Part I)).

I was still standing there in front of my friend. I got many questions answered and well-articulated inside my head. But, just like taking a test, thoughts are useless if the oval corresponding to the right answer on the answer sheet is not filled. Her original question was regarding her company’s website, and whether it was a SaaS or cloud. Before I enter what it appears to be a brain-trap, I need to define what a SaaS is. After all, I used to tell my students that if they could not explain any highly technical concept to their grandmas, then it is an indication that they don’t understand it themselves. I am standing here in front of a younger version of my grandma, but the statement is the same. The easy explanation of SaaS is simply “Software as a Service”. Great. Case closed. Not really. What the hell is a software? And what the hell is a service? And what does it meant to provide software as a service? We know what software is. It is any application that you use day to day. Service means that you have a black box sitting somewhere else (cloud?) that takes an input from a user and produces an answer. Think of something like a calculator. It takes numbers, and operators and produces the answer. There are a few characteristics of services. First, it has to be stateless (it does not remember you the second time you call it, and it doesn’t discriminate against who is calling it). Secondly, it has to produce one answer, which is consistent (no matter how many times and what time of the day you call the same service with the same arguments, you should always end up with the same answer – unless you are calling the getCurrentTime service :)). The last property of the service is that it is reachable via TCP/IP calls (not via mailed-letter using the post office).

So, providing the application or software as a service that is accessible from anywhere (since the web runs on a wrapper protocol (HTTP) above TCP/IP protocol), by anyone (no discrimination principle), and always consistent, is what SaaS is all about. The fact that you provide that software as a service that is callable by anyone from anywhere, then you can also create code that calls this software from another piece of software! This provides for a powerful concept on the cloud where applications are interoperable using proprietary-free calls. This allows for not only scalability across regions on the Internet but also interoperability among different cloud service providers (such as Microsoft, Amazon and RackSpace)!

The same definition of SaaS applies to all the other ?aaS resources. IaaS is nothing but offering hardware/infrastructure resources via TCP/IP API calls (S3 by Amazon is a leader in this space). PaaS is offering enabling platforms (such as Force.com and Google Apps Engine) which allow developers to create SaaS on top of those platforms on the cloud (such as SalesForce.com and Google Apps). You have other ?aaS such as DaaS (which stands for data as a service if you are talking to a data provider, or desktop as a service if you are talking to Citrix, Cisco and VMWare), etc.

The reason why ?aaS are provided via publicly-accessible API is to provide the power of management to the end user (as opposed to the IT administrator that has been hogging the power over computing resources for a long time). Those public APIs allowed many third party companies created pluggable easy-to-use management tools to manipulate those ?aaS resources on the fly. Yes, that is right, the cloud changes the concept of the Internet (or evolves it?) from a relationship where the end user is helpless recipient of what is given to him, to a more empowered user that has the ability to interact with the website and public service albeit at a minimal level (via Web 2.0 and ajax-driven applications) to being completely in control where he not only manipulates what the service has to say, but also how much resources are available to it, etc. via a user-friendly interface to the cloud service provider’s data center. Much like Turbotax, user-friendly interfaces to cloud service provider’s websites empower average cyber-joes to design their own data center and their public applications with a 16 digit magic code (a.k.a. credit card number). Although I think those “user-friendly” interfaces have a long way before they become friendly, but what we have today is a good start.

That is it, I made my friend wait too long, and if I don’t let the words out soon, I may as well switch the topic and improvise a weather joke that includes the word “cloud” in it. She did ask whether her website is a SaaS. Let’s examine this for a second. Is a website a software? That seems much harder to answer than it seems. Wikipedia defines software as “… a general term primarily used for digitally stored data such as computer programs and other kinds of information read and written by computers.” This implies that websites are considered software. This is not a typical thing from a developer’s perspective as websites seem to be just an outcome or result of a software. I actually agree with this intuition and declare that websites are results of software and not software by themselves. Just like this blog is not considered software but a few paragraphs of text. So, since a website is not software, it cannot be SaaS. However, the code that produces the website is software and is running on a server somewhere. However, although it is running on a host somewhere, it (the software itself) is not available as a service. No one interacts with this software itself. Users interact with the results of the software (i.e. website). Users cannot create code that plugs into the server side code of the web site to modify it or use it. Thus, it is not SaaS. But wait a minute, does that mean that gmail is not a SaaS? Not entirely true. It is SaaS because although it has a relatively constant interface that users cannot modify 100%, it is provided publicly as a set of APIs that can be plugged into by third party libraries to not only change the interface of gmail for particular users completely, but also extract email and most of gmail’s functionality and integrate those into a third-party application. That is something you cannot do with a third-party website.

So, finally I got to the first question and I have an answer. No, a website is not considered SaaS. Phew! That only took about two blog entries to answer! What about a website being The cloud? No, a website is not the cloud. However, it can be running on a cloud (has all the benefits of infinite scalability, FA, HA, clustering, etc.) But wait a minute. A website is not a SaaS, but it can be run on the cloud? How is that possible? Well, did you ever (as a developer) write non-OOP (object oriented program) code using an OOP language? What about forcing SOAP to manage sessions? You get the point. It is not ideal to have an application running on the cloud that is not highly service-oriented. If you end up running such a badly-designed application then you will end up not utilizing the full power of the cloud, and instead you would be using it as a simple host provider for your site. Which is what many companies do after all. The reason I say that you will not realize the full benefit of being on the cloud using a non-service oriented application (such as a website), is because if your web site has poor session management (non-distributed), then scaling it out is going to be really hard (how will you make sure when the new request goes to another instance that it will have access to the original session started on another instance? Can a user withdraw money ten times just because she hit ten different instances that are out of sync with each other?) Remember, the cloud is not something magical (outside of the context of Care Bears and their magic clouds (Care-a-Lot)). So, you must make sure that your application is SaaS ready before you can realize all the benefit of the cloud. Otherwise, you are deploying your application to one physical server on the cloud and will never realize scalability or clustering powers that are available to you for a small fee. So, a website can run on the cloud, and will utilize scalability and clustering capabilities of the cloud if and only if it is designed in such a way that makes it service-oriented. You can still have session management, but that needs to be designed to be distributed and managed outside of the services that your website server side code uses. This way, you can scale out the components that are stateless, and replicate the stateful cache.

So, in summary, the cloud continues to have the same concept it had before (as a UML-like component on a blackboard) by continuing to be the important but irrelevant part of the system. And, coming a full circle, I open my mouth, and this time I am confident that I have covered all the possible paths a curve-ball may take. I told her that her company’s website can actually be run on the cloud. In this case, it would be her company’s host provider. However, the cloud is not a one-technology thing, it is a mixture of solutions and resources. And, depending on what her company requires the web site do/offer, and the guarantees they provide to their customers about the website availability and up-time, you may have to review which cloud service provider offers the best contract (SLA – service license agreement), and she may need to check with her developers that the web site is cloud-ready and will leverage full advantage of those guarantees. She followed up with a question: “I see, so what is SaaS?” Oh man! Sometimes I wish I can copy and paste my thoughts into speech so I won’t have to repeat what I said in this blog to her again. But, it may actually be good that I will talk about it again, because this time I will have to describe what SaaS is in even simpler terms. So, I said: Well, if your web site offers a service that your customers use on the website, and would also like them to be able to use that same service on the iPhone, iPad, Droid, etc. as well as offer that exact same service to THEIR customers on their web site, then you have to provide that one service as a SaaS. Which means that your developers have to know how to design it in such a way to enable it to be detachable from your web site and available to be called and used by other third-party (your customers’) applications. Your website will use that service as well. It is good to design all those usable services as such when you go to the cloud because it will enable you to take advantage of what the cloud provides such as scalability (you can handle spikes in user requests).

She seemed to follow my statements and logic. So, maybe I did pass the “explain-it-to-your-grandma” test! Although I may have hurt my chances of having better luck having her call me back after I give her my business card so I can provide consultation on how to design such as a cloud-ready application, since my answers seemed to draw a pretty and easy picture of this whole complex cloud concept! The cloud can offer pretty much anything anybody needs on the web (because of its “potential” to be proprietary-free, interoperable, and pluggable), except the opportunity to take back what you just said to someone else ūüôā

Cloud Computing Simplified (Part I)

December 14, 2010

Gartner projected that cloud computing will continue to pick up momentum next year as it cements its position as one of the top four technologies that companies would invest in. Earlier this past November, Gartner projected that cloud computing would (continue to) be a revolution much like e-business was, rather than just being a new technology added to the stack of IT arsenal.

What is cloud computing? A tech-challenged friend of mine asked one time. As I took a deep breathe before I let out the loads of knowledge that I have accumulated reading about, teaching about and working with cloud applications in its various layers, I froze for a second as I did not know how to explain it in layman’s terms. Of course if I were talking to an IT person I would get away with talking about the mouthful acronyms of ?aaS (where ? is any letter in the alphabet). However, when the person opens the discussion asking “Is my company’s website a SaaS?”, “Is my company’s website a cloud? The cloud? Running in a cloud?”, it hit me maybe there was a reason why every book I pick up or article I come across that talks about the cloud, includes a first paragraph that always provides a disclaimer that goes like this “The first thing about cloud computing is that it has many definitions, and no one can define it precisely, but we will give it a try in this book”. Maybe that is how I should start every attempt to define the cloud to someone else?

Before I answered my friend’s question, I phased out for a little bit as a big blackboard suddenly appeared in front of me with humongous set of UML diagrams (they were called UML but they included every possible shape and a verbal disclaimer for getting away from the standard diagrams by the architect drawing on the board). Those diagrams were so big and complex that people moved away from typing notes to taking pictures of the board. Including “high resolution” as an important requirement of a smart phone¬† before purchasing it, became an important differentiator between competitor phones (the iPhone won the war for me as it has the best resolution as of the time of this typing). High resolution was important because you see everyone in the room shouldering each other at the end of the meeting as they take a picture of the complex diagram using their phone. None of that stuck out in my mind, as I was phasing out, as much as the big cloud diagram drawn to the side of the big complex picture. That component was always known by everyone as the Cloud, and it always meant “everything else”, “something else”, “anything”, “everything”, “something”, “I have no freaking clue what there is”, etc. It was the piece that no one cared about, it was just sitting there encompassing a big piece of the entire system, yet, it was of less importance (apparently) as everything else on the board, thus earning its “cloudy” scribble to the side of the board. OK, so the Cloud is something useless that no one pays attention to? That may be a cop-out if my friend hears those words out of my mouth after a pause.

All of a sudden, something else happened. As I was dazing off, I started thinking to myself: Well, wait a minute. Did we change our definition of that little useless cloud drawn on the board? Or is it really useless? To be able to answer that question, I had to understand why the hell we resorted to drawing “everything else”, “something else”, etc. as an isolated cloud. Maybe it wasn’t useless. Maybe it was extremely important but irrelevant. Which is a big difference. Sometimes it is good to have a big chunk of your system abstracted away in its own little cloud shape being irrelevant to any other changes you make to your system. After all, if every change you make affects everything else in the system including your infrastructure, there is a big problem in your system’s design to begin with. So, maybe it was a good thing that architects had a big cloud sitting off to the side. As a matter of fact, the bigger that cloud that is not affected and does not affect your changes to the rest of the system the better! OK, so now I have solved one mystery. The cloud is the part of your system (infrastructure, platform, and software) that is abstracted from you because your system was designed correctly in such a way to have its component independent of each other. This way, you can ultimately concentrate on the business problem at hand, rather than all the overhead that is there just as an enabler rather than being fundamental to the business at its core.

But wait! If architects have been defining clouds on their boards for such a long period of time, what is so new about it? The short answer is: nothing is new about the cloud! The cloud is not something that just popped up. It has been around for a long time! From clusters of mainframes in the 60s and 70s, to GRID computing, the cloud has been there for a long time. The reason why it was picked up by the professional world just recently was due to the giant improvements in virtualization management tools that allowed the customer to easily manage the complexity of clouds. OK, I am getting somewhere, but I still haven’t gotten to my friend’s questions. Maybe I should continue to go deeper to find more answers first before I unload my knowledge, or lack-thereof! Going back to the board. Was every cloud really a cloud? After all, sometimes an architect draws a cloud around a piece or a component that she doesn’t understand. So, there has to be a difference between the cloud that a knowledgeable architect draws versus someone else that had it mistaken for another UML diagram component.¬† So, what defines a cloud? Well, it is an abstracted part of the system. It may encompass infrastructure, platform and/or software pieces.

That is the easy part. Here comes the more technical part. The cloud provides infrastructure, platform and software on an as-needed basis (service). It gives you more resources when you need them, AND it takes away extra resources you are not using (yes, the cloud is not for the selfish of us). To be able to call it a form of Utility Computing (where you pay only for the resources you use), an additional cost factor is associated with the resources you use on an hourly basis (or resource basis). Yes, if it is free, then it is not the cloud. No place for socialists on the cloud. We will skip this argument of paid versus free because you will tell me that many cloud-based applications are available for free, and I will tell you that nothing is free because those same applications provide a free version (which is just a honeypot for businesses) and the paid version (which is where businesses need to end up just to match the capabilities of their in-house software). The cloud may be free for individuals like me and you who don’t care about up-time and reliability, but I am focusing my discussion here on businesses (after all, an individual will not have their own data center, or require scalability, etc.)¬† So, I win the argument, and we move on!

Another condition is for all those resources to be available via TCP/IP based requests (iSCSI or iFC for hardware, web-services API calls for platform and software resources). It is important that requests and responses to cloud’s resources go on the web, otherwise you cannot scale up to other data centers sitting somewhere else. The cloud is scalable (“infinite” resources), DR (disaster recovery) ready, FA (fail-over) ready, and has a very HA (high availability). The last three characteristics are made possible by a few technology solutions that sit on top of the cloud such as virtualization, although virtualization is not a necessary component in a cloud due to the use of commodity computers factor which is to be discussed below. With virtualization management tools, DR – for applications only, FA and HA are provided out of the box,¬† DR for databases is made possible by replication (and acceleration and de-duplication techniques sped up the process across physical LANs) tools. Scalability is provided by the SOA (service oriented architecture) characteristics of the cloud of its independent and stateless services. Another component that is essential to defining what a cloud is, is the use of commodity computers. It is essential because single CPU power is not relevant anymore as long as¬† you have an infinite pool of your average joe-CPU. Google builds its own commodity computers in-house (although no one knows the configurations of such computers), and it is believed that Google has up to 1,000,000 commodity computers powering its giant cloud. If the cloud service provider is using powerful servers instead of commodity computers, that is an indication that they don’t have enough computers to keep their promise of scaling up for you. The reason why both properties cannot co-exist is because if the provider can have “unlimited” powerful servers, that implies that to offset their cost (especially cooling) they would have to provide the service to you for extremely high prices, offsetting the benefit of moving to the cloud versus having your own data center. Many people also believe the cloud to be public (meaning that your application will be running as a virtual machine right next to your competitor’s on the same physical machine with a low probability – albeit bigger than 0%).

Many companies choose to run their own “cloud” data center (private), which theoretically violates a few of the concepts we defined above (cost per usage, unlimited scalability). That is why some don’t consider private clouds to be clouds at all. However, to their dislikes, private clouds have dominated the early market as companies have bought into the cloud concept but they still fear that it uses too many new technologies, which makes it susceptible to early security breaches and problems. Amazon had a major outage in the East region in Virginia toward the end of 2009 (which violates the guarantee that your services are always available as they are replicated across separate availability regions). So, we know what private clouds (internal data centers designed with all the properties of a cloud minus the cost per usage and unlimited resources), and what public clouds are (gmail anyone?). What if you want a combination of both? Sure, no problem, says the cloud. You have the hybrid cloud (using a combination of private clouds for your internal, business critical applications that do not require real time scalability to public services on public clouds where you push your applications that require the highest availability and scalability (Take Walmart’s web site around the holidays for example). This is going to be the future of the cloud as there are always going to be applications that do not need the power of the cloud such as email (does not need to scale infinitely in real time), HR, financial applications, etc. There is a fourth kind which is a virtual cloud. This is having the advantage of both worlds (public resources but private physical data centers). You have access to a public cloud, but your own secure isolated vLAN that you can access via VPN over HTTP (IPSec). The virtual cloud will guarantee that no other company’s applications will be run on the same physical hardware as your company’s. Your internal applications will connect to your other services sitting on a public cloud via secure channels and dedicated infrastructure (on the public cloud).

If you are confused about how applications (for various companies) can be running on the same physical machine and why in hell you would want to do that, check out my two series articles about virtualization.

— To be continued here

A Suggestion for the TSA

November 22, 2010

As you all have read, the TSA has approved new measures to increase security at the airport by forcing travelers to go through pat-down procedures that are highly intrusive, shall they refuse to go through the microwave machine. If you choose to refuse the pat-down procedure as well, you will be forced to pay a fine of $11,000 and potentially go to jail. The latter action is there to ensure that a terrorist won’t try to test the waters by bringing in a bomb with him, strapped around his waist under his underwear, and when he is selected “randomly” because he is a little too tanned to be innocent, he would then opts out of all checks and decides to leave the airport. That is a point, well taken.

Surprisingly, people did not respond too well to allowing a random person fondle them through their clothes, in the name of security. And to add to the shock, they certainly did not take well the idea of having an adult fondle their kids as well.¬† As we all know, nothing is more dangerous than a loaded child’s underwear or diaper. Under a lot of pressure, the TSA finally decided that kids under 12 years old will not be required to go through the pat-down process. This irresponsible action just opened the door to a whole new level of security cracks at the airport, through which, a terrorist baby (an argument made by Rep. Louie Gohmert (R-Texas) regarding how Muslims come pregnant to the US, deliver a US citizen, then take the baby overseas for terrorist training until the baby is ready to carry on a terrorist attack) can smuggle or be used to smuggle bombs through security checks.

I personally have no problem with added security. After all, it is there to protect us from harm, or in my case being a person of Middle Eastern descent, to protect me from myself. However, I see a lot of problems with the security approach pursued by the Transportation Security Agency. It is the reactionary approach that they follow after every terrorist attempt. One terrorist hides a bomb in his shoes, and we all have to take our shoes off for security screening. One terrorist hides a bomb in his underwear, and we all have to go through a testicle examination at the airport.¬† As a software engineer, I learned that patching issues after they occur rather than thinking of a bigger solution to all potential problems in the future is a big no-no. The reason why this approach fails is because there is always going to be a problem in the future to which I have to introduce a patch fix. However, if I take a step back and change my strategy overall, I may be able to change my code at the base and eliminate many potential problems. For example, British spies found out that AlQaida is planning to train radical doctors to implant bombs in women’s breasts. If they do that, the bomb can go undetected by all security measures that we have today. We are not worried about it today because no woman tried to do that yet. I heard women are very picky overseas and refused to implant bombs that would make their boobs be less than size D, or less round, or have less of a natural touch. That is why the process is dragging a little longer than what Osama had in mind. But it will come at some point, and a woman will get through the security checks, and if we are lucky, she will be caught and stopped by air marshals when smoke starts coming out of her breasts when she tries to trigger the bomb. “Her boobs were smoking, and I could think of nothing else but jumping on her and diffusing the bomb”, said the air marshal. Then we will hear about the boob bomber (don’t look for the domain name because I already purchased it). Now what? The only way to start checking for boobs that are “the bomb” is by training security guards to grab female traveler’s boobs and find potential bombs. The “random” search trigger may be “A woman with large boobs”. After all, women with smaller breasts may not have bombs, or maybe the bomb is too small to cause damage. “I will grope any big breasted woman for potentially carrying a bomb for the sake of my country and its national security”, will say any patriot working at an airport security check near your city. This whole approach of reacting to something that already happened won’t get you anywhere. It is reacting to something that failed or succeeded, but it won’t help against future attempts of different nature.

The TSA must change its strategy and find another way to increase our security, instead of reacting to everything terrorists do. If they choose to continue down this path to see how far they can stretch this, or us, I have a solution for them to quiet all whiners. A long term solution that fixes the whole problem at the root, rather than patching a solution every time an issue rises. If you want to pat-down all travelers, and you want to invade our personal privacy and potentially add more and more intrusive pat-down procedures, then your best bet is to introduce a little flavor into this process. Bring really good looking security agents (men and women), and allow the traveler to select (from a line up) the person they would like to grope them. From the opposite sex (or the same based on the traveler’s choice).¬† We should still have the option to ask for a private room like we do today, and additionally, we should have the option to ask for seconds, or thirds if we think we may be of danger to this great country. I think all Middle Easterners and all that have funny accents should have the choice to select more than one person to go through several pat-downs at the same time, or sequentially in a private room. When TSA introduces this measure, many people would change their minds about the procedure, after all, we are going through tough times and we cannot all afford paying singles at a club somewhere. I think this will not only be received favorably by many people, but would actually boost the transportation industry. After all, a trip to the strip club for a few lap dances and alcoholic drinks could cost you more than a trip to another city for a few days vacation, plus the free pat-downs. This will also save the TSA a lot of money because they won’t have to pay for expensive machines anymore. It will also allow them to introduce more and more intrusive measures and people would only receive them even more favorably than the previous less intrusive measures. And hey, maybe this type of procedure would push a terrorist to have a change of heart after going through this experience, after all, a bird at hand is better than 77 somewhere in heaven.

Virtualization – Under the Hood (Part II)

November 19, 2010

This is a continuation of my last post (Virtualization – Under the Hood (Part I)).

Q: Can you actually have more memory allocated than available physical memory? And how?

Short Answer: Yes. Through  many techniques including: Transparent page sharing, page reclamation, balloon driver, etc.

Long Answer: You can actually start many VMs with total allocated memory that is more than the physical memory available on the server because not all applications will utilize 100% of their requested memory at all times. Page reclamation allows the hypervisor to reclaim unused (previously committed) pages from one VM and give it to another. Another technique a hypervisor may use is to allow VMs to share memory without them knowing it! Sounds scary, but nonetheless manageable by the hypervisor. This allows more VMs to start with their full requirements of allocated memory met, although they may be sharing memory pages with other VMs. Lastly, there is the approach of ballooning memory out of a VM. This is more of a respect driven approach by the hypervisor where it requests memory from all executing VMs, and they would voluntarily balloon out all the memory pages they are not using. Once they need the memory back, the hypervisor will send it back after obtaining it from other VMs using any of the methods above. Swapping pages with the hypervisor is an expensive operation. That is why you should always start your VM with a pre-set Reservation amount (minimum amount of memory guaranteed to the VM by the hypervisor). However, the more you reserve upon start up of your VM, the less VMs can be fired up on that same physical host.

Q: How do you make sure your application is highly available?

Short Answer: It depends on the approach, and the virtualization suite you are using. You either take things into your own hand and cluster your application over 2+ VMs, and make sure you replicate necessary data over to the redundant VM(s), or use the tools provided by virtualization suite to move the VM to a less utilized or crowded host.

Long Answer: High availability of VMs can be jeopordized in one of two ways:

1. Either your VM is running on a highly utilized host, making your applications less responsive. In this approach you can utilize the virtualization suite to transfer or migrate your running VM to another host that is less crowded. vSphere provides vMotion, which is used by their HA appliance to migrate your VM to another host without taking the VM down! They actually start copying your VM byte by byte starting with the section of your memory that is not or under utilized at the moment, while keeping track of all “dirtied” pages since the last transfer to re-transfer again to keep it consistent on the new host. At some point the hypervisor of the first machine will turn off the VM, while the other hypervisor on the target machine turns it on simultaneously. Microsoft added Live Migration to their R2 release of HyperV to do just that. There are many metrics and thresholds that can be configured to trigger such an action. Dynamic Resource Scheduling (DRS) in vSphere allows you to set those parameters and move away from the cluster, DRS will manage to move your VM from one host to another to ensure highest availability and accessibility.

2. When a host goes down, another VM needs to fire to start taking requests. This can be done using virtualization suite tools (only when data replication is not required). However, when you need to cluster your data as well then you will need to introduce data replication yourself such as Microsoft SQL Server clustering. This will allow the new VM to immediately serve requests as soon as the first VM goes down. Of course there will need to be some sort of switch control at the virtualization suite management level or using an external context switch appliance such as NetScaler.

Q: Is virtualization the equivalent of parallel processing on multiple processors?

Short Answer: Yes and no.

Long Answer: Virtualization introduces the capability of borrowing CPU cycles from all CPUs physically available on the host. The side-effect of this is introducing the effect of parallel processing. However, the only reason a hypervisor would want to borrow cycles from another CPU is because the first CPU it had access to is fully utilized. So, technically, you are not really parallelizing to run things in parallel, but rather to use as much of the CPU cycles as your application needs to run its single- or multi-threaded code.

Q: Since we are running multiple VMs on the same host, doesn’t that mean we share the same LAN?? Wouldn’t that be a security threat if one application on one VM was missconfigured to access an IP of a service on another VM?

Short Answer: Yes. But we don’t have to share the same network even among VMs on the same host.

Long Answer: You can create virtual LANs even between VMs running on the same physical host. You can even use firewalls between VMs running on the same host. This way you can create DMZs that keep your applications (within your VM) safe no matter which VMs are running on the same host.

Q: Since a hypervisor emulates hardware, does that mean that my guest operating systems are portable, EVEN among different hardware architectures?

Short Answer: Yes.

Long Answer: Most virtualization suites support x86 architectures because they are “better” designed to take advantage of virtualization. It also depends on the virtualization suite you are using, and what guest OS it supports (for example vSphere does not support AIX). Additionally, although in theory those guest OS and their hosted applications are portable, it also depends on the hypervisor’s own implementation of drivers on the system. The hypervisor code does not use the drivers installed inside the guest OS, but its own set of drivers. The implementation could vary from one system to another, one device to another. So, you definitely may end up with different behaviors or performance on different hardware even using the same VM.

Note: The Open Virtualization Format (OVF) is a standard format to save VM in so you can migrate them to another virtualization suite (not just hardware!) However, not many virtualization tools support this format yet.

Q: What about security? Who controls access to VMs?

Short Answer: Virtualization suites provide user management. This list is separate from application users.

Long Answer: There are many layers of user roles and permission management in a virtualization suite, depending on the suite itself. Typically, you can create users, define their role, their access to VM, and what type of permission they get. You can even create pools of VMs and apply the same set of user role/permission combination. This eliminates having to manage security and authentication on each individual hypervisor, and instead, do the management across a host of them.

Q: Ok, ok, ok. How about the dark side of virtualization?

Short Answer: There are many problems or potential problems with virtualization. It is not all roses.

Long Answer: There could be many reasons why not to use virtualization including:

1. With virtualization, now you are collapsing the system administration and networking team (and possibly security as well) into one team. Most (if not all) virtualization suites do not provide various roles of managing the virtualized datacenter based on those divisions. Once you have an administrator access to managing the virtualized datacenter, all divisions are off at that point. This can be seen as a good thing. However, it is mostly a bad thing because a system administrator is not necessarily a person that is highly specialized in dissecting the network among all the various applications based on the requirements and capabilities of the enterprise.

2. Upgrading or updating one application or VM requires a lot more knowledge in its requirements and potential effects on other VMs on the same host. For example, if an application doubles its memory requirements, the IT administrator managing the virtual environment must know even if the increase in requirement comes on a host that has that enough physical memory. In a traditional environment, as long as the physical memory is available, the IT administrator deploying the updates or upgrades does not necessarily need to know of the new memory requirements of the application as long as no additional physical memory needs to be attached to the server. This change forces administrators of the virtual environment to be more aware and knowledgeable of the applications and VMs running in their system, which is not a hard-line requirement in traditional systems.

3. If the number of VMs fall under 10 or so per host, then you maybe adding more overhead than realizing the benefits of virtualizing your machines.

4. Testing and debugging system events are a lot more involved now as an administrator has to chase the VM wherever it goes in order to check the trail of events across those machines, plus look at the guest OS even logs to complete the picture before understanding the problem.

5. Created VMs will require physical space as well (for the VM files themselves). This is an overhead, and if not managed correctly you may end up bursting your storage capacity bubble with over-creating VMs.

6. Miscellaneous: expensive management tools, new IT skills to be learned, single point of failure (if one host goes down, it will take down many VMs with it), more bandwidth headache if one physical host starts up (making many VMs initialize while starting up at the same time), etc.

Virtualization – Under the Hood (Part I)

November 19, 2010

Sometimes we take long pauses thinking about something someone said. We are not discarding what is being said, but we are not acknowledging it either. Sometimes, the person repeats the same sentence again, maybe in different tone or structure. And we may continue to pause, although we may shift our eyes to the speaker’s eyes to express a less formal way of saying: I heard you, but I am still thinking about it! It may take some time before we let the first word out of our lips, just to start the conversation.

Virtualization is not a new concept by any means. It started in the 1960s as a form to virtualize memory to trick system applications into thinking there is more memory to play with than there actually is.¬† This evolved into the concept of time-sharing on mainframes where individual applications were made to believe they have all the resources they need, although shared and restricted in time.¬† I am going to skip a few generations to talk about today’s virtualization to avoid duplicating what could be obtained from well-maintained definitions and history lessons on virtualization all over the Internet. However, the exhibition of long pauses over the concept of virtualization and its potential extended over generations rather than a few seconds before we realized what we really have at hand.

VMware is certainly the leader in the market today through its vSphere offerings, with Microsoft’s HyperV behind but making long strides to catch up. There are many other virtualization suites offered by Cisco (mainly around unified communication services), Citrix (mainly around virtual desktops), and open source hypervisors such as xenapps.

We hear a lot about virtualization. But, it is like one of those topics that you “know” but you don’t really “know”.¬† We all know why objects fall towards the center of the Earth, but we don’t really know “why” objects fall towards the center of the Earth (we know gravity, but we don’t understand why negative and positive charges attract). We always hear about virtualization. It is a catchy word in an industry with a long list of names, acronyms and abbreviations. ¬†But, do we all know the basics of how it works? What it is? How it can answer some of the main enterprise questions that haunt administrators during their sleep? How it can actually add its own set of issues that make an administrator’s nightmare not so … virtual? I explored some of those questions and decided to take a few minutes from my sleep (that is the only way my hypervisor can lend me those valuable resources) to share them here. I will address those in the form of questions and answers, simply because that is the easiest way to get to the point without dancing around it.

Q: Is virtualization a software or a hardware technology?

Short Answer: Both.

Long Answer: There are two types of hypervisors (Type 1: runs directly on the hardware, and Type 2: runs on a host OS). Type 1 is a hardware virtualization solution simply because that plays well with our definition of hardware (we define hardware to be anything below the supervisor code in an OS, and since hypervisors are below the supervisor code, then it is a hardware solution). However, there is another type of hypervisors, type 2, where the hypervisor runs on top of the host OS as another application. This is not very common for a lot of reasons including the requirement to modify the OS to accommodate virtualization, OR settling for a major overhead by the hypervisor to do the translation to host OS terms and not being able to optimize drivers. The conversation is too deep for this post, but there are many types of hardware virtualization (hardware-assisted virtualization, paravirtualization, partial virtualization, etc.) where you have a mix of hardware-assisted virtualization added to software-ful hypervisors.

Q: Why virtualize?

Short Answer: To get the most out of our idle resources today.

Long Answer: How much time do you have? There are a lot of reasons to virtualize your OS and its applications. Here is a short list:

1. Most applications today utilize around 10-15% CPU. With virtualization, that utilization increases. “Virtualization is extraordinarily expensive if the number of VMs fall below 10 on a physical machine” (Mike Rubesch – Purdue University Director of IT infrastructure systems.) You also have to think about the overhead to run the hypervisor which will take a few CPU cycles as well.

2. Less physical space. If you combine 10+ applications on the same physical host, you could potentially decrease the number of your servers by a factor of 10+.

3. Quick turn-around. It is much faster to create a VM from a VM template than to build a machine.

4. Introduces leaner process. The concept of leasing VMs and expiring them enables IT administrators to build VMs for a pre-determined length of time. In traditional systems, such a machine may go unused after a certain period of time, adding more overhead and headache.

5. Easy to handle disaster recovery (DR), high availability (HA), and fail-over (FA) as most of virtualization suites include tools to easily manage those concerns (such as vSphere’s and HyperV’s HA and FA products, enabled by vMotion and Live Migration).

6. And many other reasons that I don’t want to talk about here including: one stop security management, lower energy cost, self-contained infrastructure requirements, portability of machines, etc.

Q: What are applications that are best kept un-virtualized?

Short Answer: Applications that require fastest possible responses or CPU-bound.

Long Answer: You can virtualize everything, but you cannot offer more bandwidth throughput than your server can physically handle. All applications requiring heavy I/O such as databases and other applications that write to disk, may be tremendously slowed by virtualization. I have come across a few companies that refuse to virtualize their database servers. Furthermore, since virtualization is an overhead (an additional layer between the application and hardware that does more management and work across all VMs on the same physical machine), you can actually see slowness in running your application. Although the worst of those were no more than 20% decline in performance, nowadays, you rarely see such high declination levels, but overhead is definitely above 0%. So, if your application queries the hardware clock for nanoseconds, then maybe it is not a good idea to run it on a virtualized machine.

More to come…

CTO versus Architect

February 4, 2010

A software architect is a person who is responsible for setting the road map for his company’s software. A CTO is an architect with a twist; he is restricted by another factor: the bottom line. While the architect is enthusiastic about new technologies and gets excited about new and cool tools, the CTO must balance that excitement with the deliverability of his department and not get blinded by the colorful display of those technologies. It is much tougher to be a CTO than an architect because you have to resist all temptations. The use of dynamically typed languages is pretty cool and hip, but if the application relies on many users effectively using those dynamic behaviors, you may end up gambling with your bottom line by risking your application being untestable.

For an architect, cool technologies must also not be advocated just because they are cool. However, it is much easier to get lost in those feelings when emotions fly high for the very scene of those cool technologies. As an architect, you are loyal to advancements in technologies. A CTO is loyal to the company. A CTO is not a better person than an architect, he just has more incentives in the company than an architect does. Adding more programmers may affect the bottom line (more cost), but to an architect, it is a great opportunity to explore another technology or tool. You may argue that that may not be a bad thing, and may actually help the company in the long run. That is what an architect would argue anyway. ¬†Exactly, that is why I said a CTO’s job is tougher not because he has to have the outlook of an architect only, but also consider the opportunity cost of where else the money could have been spent to maximize ROI. As if this is not enough, he further has to evaluate the business risks that mount due to flirting with new technologies. Using cutting edge technologies may be cool but not necessarily a wise business decision if it introduces slower development, riskier code in production, harder to unit test code, etc.

There is a reason why a CTO should be a big part of the top management of the company. She is enforcing policies and budgets to protect the bottom line, while ensuring R&D is not suffocated. It is very important for the CTO to become extremely well aware of the financial flexibilities that she has as opposed to “budgets” because she should be responsible enough to handle such flexibility. She should be taking out of the basket what she thinks is needed or good enough to optimize the margin of ROI while continuing to give her company a competitive edge in the market.

Don’t be a technology slave

February 4, 2010

Technologies are tools to solve a problem, and not the end goal. I like to think of a technology as a screw driver used to assemble the product rather than the finished product itself.  As a computer scientist, we are problem solvers, and not poets of a certain language. Otherwise we would just be called code entry experts.  Understanding a problem and outlining a solution should come before selecting the technology.

I hear many people say that they hate Microsoft products. Some hate Google’s web API. Some don’t like Java because it is “too OS independent”. Etc. To me, that sounds like someone was trying to pick their screw driver before having a look at the screw itself. As a developer, you need to listen to the problem, and understand the solution, before picking the technology. A technology is supposed to help you implement your solutions faster and more efficiently, and not feed into your love for that particular technology. You should be failthful to the concepts you were taught to balance such as speed, memory, productivity, comfigurability, etc. You should not be faithful to a technology per se. There is a reason why new technologies come about more frequent than corruption scandals in Chicago politics. That’s because the problems in the enterprise are becoming more complex and demanding as businesses become more competitive.

So, pick the technology that best suits your implementation requirements. Don’t let your pre-choice of technology dictates how you implement your solution. When I want to create a simple website that is not highly interactive, simple HTML with JavaScript and CSS would do. If I want some powerful Ajax capabilities with high end UI but all done with quick turnaround, then gwt may be a good choice. I like Java for backend processing due to the endless frameworks and third party tools that support it. But when I want to create powerful desktop applications where the UI is a major focus then .NET would be competitive here although Java’s swing with all of its latest desktop integration and support can be very competitive. I used Fortran 90/95 on a project for Argonne National Lab where speed requirements to process highly complex numerical equations make the word “Java” a racial slur. When I wanted to find a technology to best implement rule-based flow and processing for MedCPU, I chose to use Drools as our rules engine rather than choosing to push for in-house Java implementation of what would have been re-inventing the Rete wheel.

Many times developers don’t see an inherent support by their master (pre-selected technology) for their chosen implementation. So, they fix a bad choice with a complicated decision by selecting to create an API to provide the tools necessary to implement the solution. This is the equivalent of playing with the screw driver end to make it fit the screw. Surprisingly, as anti-commonsense this may sound, I have seen it done more often than replacing the screw driver alltogether to find a more suitable one for the screw.

Now, life is not that easy or straightforward. Most of the time a developer is limited to a pre-screened or selected set of technologies. She has no choice of selecting a screw driver when all she is given is one or two before anybody knows the type of screw they will encounter. Who is at fault here and how the hell is she supposed to put the screw tight in place when she is not allowed to think outiside of the box (toolset box)? I think it is the responsibility of the architect of the system to understand the big picture of all potential problems to see and select a good set of representative tools for his developers. This may not be as easy as it is said above. However, there are plenty of ways to mix technologies in today’s applications than say five years ago. Many technologies support scripting languages to provide additional tools that cannot be provided by the host technology. Don’t be afraid to embed some groovy into your Java code if you need to. Don’t be afraid to use IPC to use a whole new technology alltogether for the new problem and its solution implementation. There is absolutely nothing wrong with using multiple technologies in the same system as long as two conditions remain constant. First, you are not adding more complexity or dependencies to your system. If so, maybe you started with the wrong choice of technologies to begin with, and it is time to start over. Two, the solution flexibility does not come at the expense of deliverability. At the end of the day, we are here to please the business. If the deployment of the new technology costs too much (learning curve, build scripts complexity, etc.) in terms of when the implementation of the solution will be available to the business user, then although it may look like it, but this is still not the right screw driver.

A great system is a hybrid system with many arms under its arsenal. The flexibility not only to change but adapt and play nice with new technologies is a powerful characteristic of great systems. So, try to select the “right” technology for the big business problem at step 0. Selecting the right technology means that you will be able to implement most of future solutions without having to change technologies alltogether or have to do “too much” to make your existing technology implement the solution as expected. Now, the system described above is a good system. What separates a good system from a great one is its ability to evolve and accept new technologies in its space without compromising the deliverability requirement. Just whatever you do, never use the wrong screw driver for the wrong screw. And as much as you hate Windows or Eclipse, there exists something out there that those two technologies or tools will marry perfectly into. So, don’t be a slave of technology.

It’s not working

February 4, 2010

There are a few things in life that I don’t like. They are so few that I couldn’t even remember them all now. But what I don’t like the most is when a developer tells me that the _____ (blank) doesn’t work. I will tell you exactly why I go bananas when this is th only thing I hear. When the code throws a white flag due to an unchecked runtime exception, blank doesn’t work. But blank doesn’t work either when ¬†the computer does not start, or when power runs out. So, how the hell am I supposed to know what the problem is and how to fix it?

You know, I’ve had my share of “doesn’t works” from my users. Onetime, I was on the phone for over half an hour with an end user when the website was ” not working”. At the end, it turned out that he was looking at the website using Outlook (where all JavaScript was disabled from the view panel). Even with him telling me that he was 100% sure he was on Internet Explorer and still wasn’t able to see the menu options I was asking him about, I am still OK with that. What I am not OK with is when a developer wastes 1 minute of my time becaus she reported “it doesn’t work”.

Even for a QA, I wouldn’t mind hearing that from. As a matter of fact, I consistently tell my developers who complain about the QA not providing enough information to reproduce (and we all know how QAs and developers get along with each other), I tell them that when a QA doesn’t deliver enough information about a problem then it is the developer’s fault for not providing enough information about what is expected or providing enough logging for the QA to use and report more. I can live with a QA that dusts off all responsibility, and blow it my way through a “it doesn’t work” remark, but I sure cannot stand it when a developer says that to me.

There is a reason a language has millions of expressions and many overloaded words to describe events and happenings slightly differently from one another. Imagine all poems had the same word or sentence repeated over and over and over again. Oh wait, that’s most of our today’s music. But you get the point. “It doesn’t work”, is not a good description of what a problem is. This wastes time communicating back and forth to get more complete information. It also paves the way to sarcasm, which leads to disrespect and inflating politics. It further leads to, which is where those situations usually end up, being dangled in the middle of nowhere like a horribly coded unreachable else statement. That is how bad code builds on top of other bad code and problems.

Remember when I said there is nothing I don’t like more than a developer saying “it doesn’t work”? Well, I also want to tell you that I hate something else. And here I am using “hate” to describe (differently) how this is worse in my opinion than the thing that I don’t like the most which I talked about above. So, what’s worst than “it doesn’t work”? “it still doesn’t work”! Those statements sound the same right? Wrong! They sound as similar as Obama’s and Cheney’s hunting skills. I think Cheney is better at shooting his friends than Obama. When a developer comes back and says “it still doesn’t work”, from my experience, that means the new code with the fix produced the exact same error when the same scenario was run with all other influencing factors held constant and invariable. But, more often than not, it “also” means that I ran the new code with a different scenario and got the same error. Even worse, I ran the new code with a new scenario and got a new error. Yet, all testing scenarios described above were abstracted and encapsulated behind the same expression “it still doesn’t work”. When an end user or a QA says that, to me it means that the application is still not working. No matter the input or the scenario run. The end user or QA does not know all execution paths of the code from a low level perspective. They know a use case only. It is not their job to map a use case to an execution path in the code. That is the developer’s job. If you want more information from your QA, empower them with logging tools and knowledge. But be careful not to mix their responsibility as a QA with a developer’s responsibility to “debug” code.

So, the next time you want to report to me that the code doesn’t work, make sure you have already done some investigation and collected some information to share with me. Even if it is not your code, you should be smart enough to collect enough information to know where the problem is, and most importantly, know based on your debugging where to delegate the information to. I receive tickets for defects and bugs from end users and QAs, but I should receive “debugging information” from developers after they receive the bug from an end user or QA. Otherwise, if the developer is just passing around the bug as reported originally, or worse, as he discovers it on his system, then he will be as useful as another GPS in the same car.

Always debug the problem. Check out the logs. Try to reproduce. Collect more information from the user who reported the problem. Treat this bug as your bug until you cannot do anything anymore and have to delegate. And if a fix is pushed out and you countered a problem, try to see if it is really the same problem (under ALL isolated conditions) or a new one. Most technologies have great debugging tools and API as well as meaningful messages. And never say “it doesn’t work” or “it still doesn’t work” EVER again.