[英]《釋出!》作者Michael Nygard:質疑軟體開發最基本的假設(圖靈訪談)

盼盼姐發表於2015-04-14

Michael T. Nygard是一位從業二十餘年的資深程式設計師,現任Cognitect首席架構師,他被譽為線上業務的“流動解決問題專家”。Nygard曾先後為美國政府、軍隊、銀行、金融、農業和零售等多個行業交付過運營系統,這種實際運營的經歷改變了他對軟體架構的看法,也讓他對在相當不友好的環境下構建高效能、高可靠性的軟體有了獨特的見解。他寫過多篇文章和社論,是軟體架構經典著作《架構之美》和《軟體架構師需要知道的97件事》的作者之一。Nygard最新出版的著作《釋出!軟體的設計與部署》詳細展示了軟體釋出前可能出現的種種問題以及相應的解決之道,書中所有主題都是通過作者自己研究過的真實案例來闡述的。

[英]《釋出!》作者Michael Nygard:質疑軟體開發最基本的假設(圖靈訪談)

iTuring: You have mentioned in your blog that you might write some new books (Three Book Ideas). Are there any developments with these books?

There's an amazing response any time you ask an author that question. He will look nervous, begin sweating, and mumble something incoherent, all while looking around wildly for the nearest exit. I'll just say that I have nothing to announce at this time.

iTuring: Some of the patterns being mentioned in Release It! are widely applied these days, such as Circuit Breaker, which Hystrix of Netflix has implemented pretty well. Considering Release It! is a book which has been published in 2007, eight years later have you found some new patterns of stability/capacity?

There's one major pattern that manifests in two ways: asynchronous style and reactive style. I see both of those as two sides to the same coin. Because many of the stability patterns result due to blocked threads, both these styles help.

iTuring: Sometimes simple mistakes can cause downtimes of the whole system. Is it a problem of a programmer’s single line of code? What mechanisms could be introduced to ensure complex systems’ stability?

Some problems really do begin with a single line of code, but there are always other factors that amplify the problem. Something may change in the external environment that causes a latent bug to manifest. Or an operator's action may cause a problem to surface with code that would normally not be executed.

Some problems, however, emerge due to the large-scale structure of the system. For example, I do not like the "entity service" model in SOA. The reason is that every application needs many entities. The laws of probability tell us that the extended system is likely to malfunction when any entity service is not working.

So, I try to create resilience (and even antifragility) at both the micro and macro scale. At the micro scale, I use design patterns like those in the book. At the macro scale, I analyze the "failure domains" in the system. That is, when one component (hardware or software) fails, what is the span of affected applications and features? It is often possible to separate the system into isolated failure domains by reallocating functionality among the applications and by splitting entities into facets.

iTuring: Does complex business induce complex systems? As an architect, how to keep the software simple while not compromising complex business models?

So far, I have not found a correlation between complex businesses and complex systems. I've found the strongest predictor of system complexity to be regulation.

iTuring: How do DevOps differentiate from traditional operation engineers?

DevOps emphasizes empathy. In a DevOps culture, developers care how their application affects operators as people. Does my application mean the administrator must stay awake late to do deployments? How can I change my application so she can spend time with her family instead of in a terminal? Operators reciprocate: How can we create an environment that lets developers create and deliver value with courage?

iTuring: From C/S and B/S of 2007 to App and NoSQL now, internet industry has been reformed. And many agile methodologies have been evolving too. What does software release change over the years? And what remain unchanged?

There are three things I think are the biggest changes:

First is the fall of the Sun and Microsoft hegemony. Then, nearly all development for companies was in Java or .Net, with an up-and-coming Ruby on Rails community. Today, it's common to see systems that use many different languages and runtime environments.

Second, cloud deployment environments have dramatically changed the economic.

Third, and largely a result of the first two, open source operations tools have democratized high-reliability operations. In 2007, it cost millions of dollars to roll out data center automation, centralized management, and monitoring. Today, you can download all of that.

iTuring: Since mobile internet and cloud services are more and more accepted by the general public, IT industry has some radical changes. What technology ideas would you recommend architects to focus?

Enterprise architects have previously focused on the technology "inside the box" on the diagram. That is, they aimed for technology standardization in the implementation details.

In today's world, I think architects must be much more concerned with data formats and representations. That is, they must focus on the arrows, not the boxes.

iTuring: Relevance code primarily in Clojure, which is very different from the major languages (C / Java / C#) most companies are using. How do you expect programming languages to become in the future?

I'm not a very good person to ask about this. All I can report is that I see many developers moving toward functional programming.

iTuring: At Relevnace on Fridays, you developers spend time on pet projects and open source software, which is 20% of your work time and it’s a big allocation. How do you benefit from this weekly event? Does it compensate your time loss?

One quick note, Relevance renamed to Cognitect in August 2013.

We've created some things in 20% time that many people would recognize, including the web framework Pedestal and the initial ClojureScript implementation. Today, our 20% time continues to go into developing Clojure, ClojureScript, Pedestal, and some new things that we'll unveil soon.

We have a long history of questioning our most basic assumptions about software development, and examining our own work to find better ways to build software. That extends to 20% time. So it's not just something we keep doing by habit or routine. We frequently assess whether it is worth it.

So far, we have always found it to be worthwhile. We are serious about making software development better for everyone. Our open source tools are part of that.

iTuring: Why did you write the tool Simulant for simulation testing? How is this project going?

Although I've been talking about Simulant a lot, it was written by Stuart Halloway based on architecture from Rich Hickey.

The Simulant library itself is stable for now. My focus is on helping people apply it successfully. To that end, I did a webinar about it last year. I've also made a sample project that you can find on GitHub. (https://github.com/mtnygard/simulant-example).

Right now, I'm working on a "solution blueprint" that should also help people do simulation testing, with or without Simulant itself.


更多精彩,加入圖靈訪談微信!

相關文章