We are in the midst of a major architectural change at SRS. As new SRSWP services are created and products begin using our SOA/SaaS platform, it is vital to understand that every service must provide a stable architecture.
SRSWP services do not grow slowly or linearly like many new products. These are foundational elements which will be integrated into other applications. They will grow by entire user-bases at a time! For example:
- Until recently one basic service was handling about 9 requests/second at the top end.
- Over the course of a week two more applications started using it, and over night the service jumped up to 15 requests/second then to 47 requests/second!
- One app that will begin using the service in the next few months will bring a load of around 29 additional requests/second.
SRSWP services are not expanding into new markets. We are designing services which will impact all SRS products and customers. At this level, even small mistakes result in major problems. So, before any release there are four major areas that must be considered:
- Security:Mission-critical systems must define and enforce security policies.
- Have we considered how to secure the service itself?
- Have we considered how to protect the data this service touches?
- Have we considered how to provide security at every level (application, software, operational)?
- Availability:Mission-critical systems cannot have downtime.
- How will this service provide 100% availability?
- How will we perform scheduled maintenance?
- How will we handle operational failures (e.g. rogue web head)?
- How are we backing up data and services?
- How will we regularly test our ability to restore and recover from backups?
- How will this service provide 100% availability?
- Stability:Mission-critical systems cannot crash; they cannot change contracts.
- Is the service 100% stable?
- How do we prove stability before release and on an ongoing basis?
- Scalability:Mission-critical systems must seamlessly scale to support load.
- How does this service scale to support a full load of users for the next year?
Any problem with an SRSWP service harms more than just that service and the team who owns it. Problems are magnified as they cascade to every consumer and their users. Problems are reflected across SRS's product lines. This must not happen!
If any service you work on cannot confidently answer all of the questions above, this must become a top priority. Work with your team leads and VPs and make sure this doesn't fall through the cracks.
Over the past few years, SRS has been undergoing an intense technological evolution. A fundamental part of this evolution is the mental transition from being entirely application-centric to instead embrace the principles of SOA and SaaS.
This transition has not been easy and has often lead to confusion. One of the most difficult concepts for us to grok has been that of multitenancy.
Question: What is multitenancy?
Answer: Multitenancy is a software architecture where a single instance of an application serves multiple customers (tenants) simultaneously.
While this concept seems straightforward, many teams have struggled to correctly identify their project's customer (tenant).
Question: Who is the customer (tenant) for my project?
Answer:For all current SRS projects, there are two possible answers:
- For external-facing projects, your customer (tenant) is a shop.
- For internal-facing products, your customer (tenant) is a project.
Once we have correctly identified our customer, our entire product must be built around that tenant. Our product must protect each tenant by safeguarding their data from any other tenant.
- In Direct-Hit, one shop may not access another shop's customer data.
- In SRSWP Logger, one project should not have access another project's log files.
Every SRS project must be architected to support multitenancy. I acknowledge that there are circumstances where defining the tenant may seem subjective. If your project has any doubt about whether the tenant has been correct identified, please ask. Any of the architects will be happy to assist.
The success of our SRS platform (including EDGE, Direct-Hit, etc.) hinges on every project correctly implementing multitenancy.
This internal memo was sent to SRS Software this morning. The points are important enough that I wanted to cross-post here.
I know there has been a lot of frustration over the last year as we have been working on zVision, the web platform, and ReEn. The shift we are making takes us from creating products to the creation of a platform.
So far, most have only felt the pain of the transition and have not seen the advantages. I promise you that this transition will be worth the pain and frustration. We are on the cusp of realizing a payout and it will be huge!
A guy who worked at Amazon and is currently at Google recently posted what was intended to be an internal memo on this topic. Everyone should take time to study the attached memo (http://steverant.pen.io) and understand the points that are made.
Here are a couple of threads where people are discussing Stevey’s memo that you may find useful:
I will continue to do everything I can to clearly communicate the amazing direction we are headed.
- zVision puts us at the forefront of the technology industry!
- zVision is the platform we need to carry us for the next 10+ years!
- zVision will give SRS the flexibility and agility to dominate in our chosen market!
You are welcome to send me questions or comments privately. Alternatively, I have cross-posted this to my blog and you are also welcome to publically comment there.
Nate Zobrist | VP of Software Architecture
Service Repair Solutions, Inc. — Revolutionizing the Delivery of Service and Repair™
770 East Technology Avenue, Building F | Orem, Utah 84097
Phone: (801) 437-5846 | Fax: (801) 437-5899 | Cell: (801) 788-4789
At Dreamforce '11, a presentation I particularly enjoyed was given by Ryan Smith from Heroku. Titled Designing for the Cloud: The 12 Factor App, Ryan discussed some fundamental design patterns and practices that have made Heroku successful.
An interesting analogy that was made compares applications to swiss army knives. The analogy is relevant to SRS and provides a great visual depiction of the work we are doing as part of our SaaS and SRSWP initiatives.
Historically our applications were designed as large, monolithic beasts. Like this knife, every feature that could be imagined was rolled into one of our flagship products. This design meant:
- Duplicated effort because there were no shared components between product lines.
- Intense effort required to join the team due to the large, interconnected designs.
- Even small changes were risky and had the potential of destabilizing anentire product.
- Management of each product line required enormous effort to tightly coordinate development and release of new features.
Contrast that complexity with a design where:
- Components are small, independent apps that work together (like Linux tools).
- Each component delivers specific functionality.
- Touch points between components arewell-defined contracts.
The workflow enabled by this component-based architecture is truly liberating.
- Small teams (perhaps even a single-person team) can build on top of the shared platform to quickly create new products.
- Most products do not need to worry about operational infrastructure, databases, etc.
- Products can take advantage of shared services to quickly enable powerful features in innovative ways.
- Products can tap into shared repositories of both customer-generated and catalog data.
- Existing products are simpler to maintain and introducing change is far less risky.
- A smaller codebase means that the project is much easier to grok.
- Well-defined contracts that have robust automated tests written against them mean that each component can be released independently with confidence.
- Teams can work more efficiently by choosing technologies and frameworks that are tailored to fit specific needs.
- Using standards-compliant web services for an API means that apps written in Java, Ruby on Rails or Node.js can access shared services as easily as a legacy, .NET application.
Following a component-based approach will make the creation of new apps a trivial exercise. It will free us to focus on solving interesting problems rather than being bogged down by operational overhead. The quality of our offering will increase as we become much more responsive to customers.
Applying these principles means something different for each of our existing projects and teams. What remains to be done for your team to fully benefit from this component-based design? What new functionality would you like to see exposed by the SRSWP?
In my first zVision presentation that was recently given at each of the SRS offices, I identified one of Engineering's current problems as an "absence of trust" between teams. This phrase caused some confusion that I would like to clarify.
The API's Contract Definition is essential:
- Method descriptions, examples, limitations and assumptions are all necessary and are included as part of the API documentation.
The API's Contract Stability is essential:
- Breaking changes should be very rare. Even with a disclaimer stating that it may change at any time, there is an implied level of stability in any published API.
- When a breaking change to the API is necessary, backwards compatibility will be provided. That's why their APIs all have "v1" in them!
Within SRS we should think about cross-project integration similarly to integrating with external services like del.icio.us. Except in very rare situations, touch points between products are limited to APIs (i.e. massively-versioned web services).
When I referred to "absence of trust" in zVision, I call attention to the fact that we do not yet have the requisite level of Contract Definition and Contract Stability in our APIs. Without both definition and stability, I could not trust the del.icio.us API enough to base my app on it. The same is true for building on SRS-internally produced services.
Since the end of 2010 and through 2012 we are making a large investment in re-architecting our products based on principles of SOA and SaaS. Absolutely essential to success are APIs which are both well defined and stable. Once we have those things, we can trust the services provided by other SRS teams as much as we would trust del.icio.us.