Effective STL:Item2 當心與容器無關(container-independent)的程式碼這個錯覺 (轉)

gugu99發表於2008-01-12
Effective STL:Item2 當心與容器無關(container-independent)的程式碼這個錯覺 (轉)[@more@]

Effective STL
Item2 當心與容器無關(container-independent)的程式碼這個錯覺

 STL是基於泛型思想的,陣列泛化為container,並根據它們所包含的型別而進行引數化。泛化為algorithms,並根據它們所使用的iterators型別而進行引數化。指標泛化為iterators,並根據它們所指向的物件型別而進行引數化。
 但這只是個開始。單個container型別泛化為序列容器(sequence container)和關聯容器(associative container),相似的container具有相似的功能。標準的連續container(見Item 1)提供了隨機存取iterators;而標準的基於結點的container(見Item 1)提供了雙向iterators。Sequence containers支援push_front和(或)push_back,而Associative containers則不支援。Associative containers提供了操作複雜度為時間的(譯註:這是我的理解,不知道對不對)對數的lower_bound,upper_bound和equal_range成員函式,而Sequence containers則不提供。
 隨著整個這樣的泛化過程的進行,很自然地我們也想加入進來。這種情緒是值得讚賞的,而且你在寫自己的container、iterators和algorithms時,你當然也想要追求這種泛化過程。可惜的是,很多員是以一種不同的方式來進行的。
 通常,他們不是在他們的中實現一個特定型別的container,而是試圖泛化一個container(比如說,一個vector)的概念以便於使用,仍然保留了將來可以替代它的選擇(比如一個deque或是一個list)——所有的這一切不用改變使用這個container的程式碼。換句話說,他們力圖編寫與container無關的程式碼。
 這種型別的泛化,儘管是一種出於善意的考慮,終究還是走錯了方向。即使是最為熱心於寫與container無關的程式碼的提倡者,很快他就會認識到,力圖去寫一個可以讓Sequence containers和Associative containers同時運作的軟體幾乎是沒有意義的。
 許多的成員函式是隻存在於一類container中的。例如,只有Sequence containers支援push_front或push_back,而只有Associative containers支援count和 lower_bound等等。即使是像insert和erase這種具有基本特徵和語義的操作,也是隨著container型別的不同而不同的。例如,當你插入一個物件到一個Sequence containers中時,它是處於插入位置的。但當你插入一個物件到一個Associative containers中時,container將把這個物件移動到一個按這個物件在container中的順序排列的位置。另一個例子是,以一個iterators為輸入的這種erase操作,在一個Sequence containers中時返回一個新的iterators,但在Associative containers中呼叫時則什麼也不返回(Item 9給了一個例子,解釋了這將會怎樣影響你所寫的程式碼)。
 接著我們假定你很想寫這樣的程式碼,它可以使用最為常用的Sequence containers,如vector、deque和list。很明顯,你必須要實現它們的功能的交集,這意味著不能使用reserve或capacity(見Item 14),因為deque和list沒有提供這樣的成員函式。list的存在也意味著你要放棄operator[],而且你要將自己的程式碼限制到雙向iterators的所提供的能力。反過來說,這意味著你必須將你的程式碼不能使用要求隨機存取iterators的algorithms,這些algorithms包括sort, stable_sort,partial_sort和nth_element (見Item 31)。
 另一方面來說,你想要vector不使用push_front和pop_front,並且vector和deque不能使用splice和sort形式的成員函式。上述兩種限制中,後者意味著在你的泛化了的Sequence containers中將沒有sort形式的呼叫。
 這是明顯的情形,如果你違犯了上述的任何一條限制,你的程式碼會因為你想使用的至少其中一個container而不能編譯透過。將要編譯的程式碼將會有更多的隱患。
 主犯是應用於不同的Sequence containers,使得iterators、指標和引用失效的規則是不同的。為了使所寫的程式碼同vector、deque和list正確地運作,你必須假定對於任何一個使得在這樣的任何一個container裡的iterators、指標或是引用失效的操作,在你正在使用的container裡也是失效的。從而你必須假定每一個對insert的呼叫總是失效的,因為deque::insert使得所有的iterators失效,而且由於缺乏呼叫capacity的能力,必須認為vector::insert操作使所有的指標和引用失效(Item 1解釋了deque在某些時候是很獨特的,它會讓它的iterators失效而不使它的指標和引用實效)。同樣的原因可以得出一個結論,必須認為每一個對erase的呼叫將使一切東西失效。
 還想要更多的例子?好的,你不能傳遞container裡的資料給C的介面,因為只有vector支援這個特性(見Item 16);你不能用bool作為所物件的型別實體化你的container,因為,正如Item 18所解釋的那樣,vector並不總是像一個vector那樣表現,並且它從不實際儲存bools;你不能認為list的插入和刪除操作(的複雜度)是常量,因為vector和deque在做這些操作時(的複雜度)是線性的。
 等該說的說完了該做的做完了以後,留給你的是這樣的一個泛化了的Sequence containers:你不能呼叫reserve,capacity,operator[],push_front,pop_front,splice或者任何一個需要隨機存取iterators的algorithms;這個container的每一個insert和erase呼叫的操作複雜度是線性的,並且它使所有的iterators、指標和引用失效;並且這個container不能同C相容,不能儲存bools。這樣的container是你在想在你的應用程式中使用的嗎?我想不是。
 如果你壓制住你的野心,決定放棄對list的支援。但是你仍然要放棄reserve,capacity,push_front和pop_front;你仍然必須假定所有對insert和erase的呼叫(的操作複雜度)是線性的並且使一切失效;你仍然要失去同C相容的記憶體佈局(譯註:我想應該是記憶體佈局吧);並且你仍然不能儲存bools
 如果你要放棄Sequence containers,而準備改為用不同的Associative containers來進行編碼,情況也好不到那裡去。為set和map寫這樣的程式碼基本上是不可能的,因為sets儲存單個物件而maps儲存一對物件。雖然可以為set和multiset(或map和multimap)做這樣的強制編碼,但是,同其它形式的形式相比較,只有一個值(譯註:不好意思,沒看,我猜是引數)的insert成員函式對於sets/maps的返回值是不一樣的。並且你必須謹慎地避免對在container中儲存了多少份值的複製作任何的假設。對於map和multimap而言,你必須避免使用operator[],因為這個成員函式只有map才有。
 面對現實:這樣做是不值得的。不同的container在不同的情形有它們自己的優點和缺點。它們不是設計為可互換的,你基本上是不可能涵蓋它們的。如果你想試的話,那你只不過是做春秋大夢,但是這個夢卻不是那麼的甜美。儘管如此,當你認識到你該選擇一個什麼樣的container的時候,也是黎明破曉之時。嗯,儘管不是最好的,你需要使用一個不同的container型別。現在你明白了,當你改變container型別時,你不僅要修正你的診斷出的所有問題。你也需要檢驗所用使用container的程式碼,一方面看看它是否需要根據新container的特徵作出改變;一方面看看它使iterators、指標和引用失效的規則。
 如果你不用vector而轉為使用其它的container,你也必須要確定你不再依賴於vetor的與C相容的記憶體佈局;如果你由其它container轉為使用vector,你必須確定你不使用它來儲存bools。
 儘管不可避免地要時常改變container型別,你也可以以通常的方式方便地作出這種改變,封裝、封裝,還是封裝。其中,最簡單的方式就是對container和iterators型別隨意地使用typedefs,從而,不要這樣編碼:
class Widget { ... };
vector vw;
Widget bestWidget;
... // give bestWidget a value
vector::iterator i = // find a Widget with the
find(vw.begin(), vw.end(), bestWidget); // same value as bestWidget

應該這樣寫:

class Widget { ... };
typedef vector WidgetContainer;
typedef WidgetContainer::iterator WCIterator;
WidgetContainer vw;
Widget bestWidget;
...
WCIterator i = find(vw.begin(), vw.end(), bestWidget);

 這將會使改變container型別要容易得多,這樣做將會給你帶來很大的方便,如果問題改變了,只需簡單地加入一個自定義的allocator即可。(這樣的改變不影響使iterators、指標和引用失效的規則)

class Widget { ... };
template // see Item 10 for why this
SpecialAllocator { ... }; // needs to be a template
typedef vector > WidgetContainer;
typedef WidgetContainer::iterator WCIterator;
WidgetContainer vw; // still works
Widget bestWidget;
...
WCIterator i = find(vw.begin(), vw.end(), bestWidget); // still works

 如果typedefs的封裝層面對你而言沒有意義,你還是可能認可這樣做所節省的工作量。例如,如果你有這樣的一個物件型別
mapvector::iterator,
CIStringCompare> // CIStringCompare is “case-
// insensitive string compare;”
// Item 19 describes it

而且你想要用const_iterators遍歷整個map,你真的想不止一次地這樣拼寫嗎?
map::iterator, CIStringCompare>::const_iterator

 在你用過STL一段時間以後,你會認識到typedefs是一個不錯的朋友。一個typedef只是一些其它型別的同義詞,所以它所提供的完全是字面上的封裝。但是一個typedef不能阻止它的做(或是依賴)任何它們沒有準備好的(或是依賴的)事。如果你想要限制你暴露給使用者的container的選擇的話,你需要加強封裝性,你需要用類。
 如果你想用另一個container來代替現有的一個,為了限制需要修改的程式碼,可將container封裝到一個類裡,並且限制可透過類的介面訪問的與類相關的資訊的數量。例如,如果你需要建立一個自定義的list,不要直接使用list。你應該建立一個CustomerList類,並封裝一個list在它的私有部分:

class CustomerList {
private:
typedef list CustomerContainer;
typedef CustomerContainer::iterator CCIterator;
CustomerContainer customers;
public: // limit the amount of list-specific
... // information visible through
}; // this interface

 咋一看,這可能顯得有點笨拙,畢竟一個自定義的list還是一個list,是吧?嗯,可能是的。隨後你會發現你不必像你通常預期的那樣在list的中部插入或是刪除客戶,你只是需要找出你的客戶最前面20%的資訊——一個對nth_elementalgorithms(見Item 31)的一個特製的任務。但是nth_elementalgorithms需要隨機存取iterators,它不能用在一個list中。在這種情況裡,你的自定義“list”可能用vector或是deque實現更好一些。
 你在考慮這類變化時,你仍然必須檢查每一個CustomerList的成員函式和每一個友元,看看它們是怎麼被影響的(根據效能和iterators/指標/引用的失效規則,等等)。但是如果你已經對CustomerList的實現細節作了很好的封裝以後,那麼對於CustomerList使用者的影響將會很小。你不能寫與container無關的程式碼,但是這些程式碼可能能做到與container無關。

(譯註:這是tt Meyers的新書 《Effective STL: 提高STL使用技術的50招》其中一條,在Addison-Wesley網站上公佈了其中幾條,好東東不敢獨享,特此讓大家看看,希望有幫助,作這樣的翻譯是我的第一次嘗試,很多地方我自己都覺得不滿意,貼上去讓大家指教一下好了)

以下是原文:
Item 2: Beware the illusion of container-independent code.
The STL is based on generalization. Arrays are generalized into containers and parameterized on the types of s they contain. Functions are generalized into algorithms and parameterized on the types of iterators they use. Pointers are generalized into iterators and parameterized on the type of objects they point to.
That’s just the beginning. Individual container types are generalized into sequence and associative containers, and similar containers are given similar functionality. Standard contiguous-memory containers (see Item 1) offer ran-access iterators, while standard node-based containers (again, see Item 1) prov bidirectional iterators. Sequence containers support push_front and/or push_back, while associative containers don’t. Associative containers offer logarithmic-time lower_bound, upper_bound, and equal_range member functions, but sequence containers don’t.
With all this generalization going on, it’s natural to want to join the movement. This sentiment is laudable, and when you write your own containers, iterators, and algorithms, you’ll certainly want to pursue it. Alas, many programmers try to pursue it in a different manner.
Instead of committing to particular types of containers in their software, they try to generalize the notion of a container so that they can use, say, a vector, but still preserve the option of replacing it with something like a deque or a list later — all without changing the code that uses it. That is, they strive to write container-independent code.
This kind of generalization, well-intentioned though it is, is almost always misguided. Even the most ardent advocate of container-independent code soon realizes that it makes little sense to try to write software that will work with both sequence and associative containers.
Many member functions exist for only one category of container, e.g., only sequence containers support push_front or push_back, and only associative containers support count and lower_bound, etc. Even such basics as insert and erase have signatures and semantics that vary from category to category. For example, when you insert an object into a sequence container, it stays where you put it, but if you insert an object into an associative container, the container moves the object to where it belongs in the container’s sort order. For another example, the foof erase taking an iterator returns a new iterator when invoked on a sequence container, but it returns nothing when invoked on an associative container. (Item 9 gives an example of how this can affect the code you write.)
Suppose, then, you ire to write code that can be used with the most common sequence containers: vector, deque, and list. Clearly, you must program to the intersection of their capabilities, and that means no uses of reserve or capacity (see Item 14), because deque and list don’t offer them. The presence of list also means you give up operator[], and you limit yourself to the capabilities of bidirectional iterators. That, in turn, means you must stay away from algorithms that demand random access iterators, including sort, stable_sort, partial_sort, and nth_element (see Item 31).
On the other hand, your desire to support vector rules out use of push_front and pop_front, and both vector and deque put the kibosh on splice and the member form of sort. In conjunction with the constraints above, this latter prohibition means that there is no form of sort you can call on your “generalized sequence container.”
That’s the obvious stuff. If you violate any of those restrictions, your code will fail to compile with at least one of the containers you want to be able to use. The code that will compile is more insidious.
The main culprit is the different rules for invalidation of iterators, pointers, and references that apply to different sequence containers. To write code that will work correctly with vector, deque, and list, you must assume that any operation invalidating iterators, pointers, or references in any of those containers invalidates them in the container you’re using. Thus, you must assume that every call to insert invalidates everything, because deque::insert invalidates all iterators and, lacking the ability to call capacity, vector::insert must be assumed to invalidate all pointers and references. (Item 1 explains that deque is unique in sometimes invalidating its iterators without invalidating its pointers and references.) Similar reasoning leads to the conclusion that every call to erase must be assumed to invalidate everything.
Want more? You can’t pass the data in the container to a C interface, because only vector supports that (see Item 16). You can’t instantiate your container with bool as the type of objects to be stored, because, as Item 18 explains, vector doesn’t always behave like a vector, and it never actually stores bools. You can’t assume list’s constant-time insertions and erasures, because vector and deque take linear time to perform those operations.
When all is said and done, you’re left with a “generalized sequence container” where you can’t call reserve, capacity, operator[], push_front, pop_front, splice, or any algorithm requiring random access iterators; a container where every call to insert and erase takes linear time and invalidates all iterators, pointers, and references; and a container incompatible with C where bools can’t be stored. Is that really the kind of container you want to use in your applications? I suspect not.
If you rein in your ambition and decide you’re willing to drop support for list, you still give up reserve, capacity, push_front, and pop_front; you still must assume that all calls to insert and erase take linear time and invalidate everything; you still lose layout compatibility with C; and you still can’t store bools.
If you abandon the sequence containers and shoot instead for code that can work with different associative containers, the situation isn’t much better. Writing for both set and map is close to impossible, because sets store single objects while maps store pairs of objects. Even writing for both set and multiset (or map and multimap) is tough. The insert member function taking only a value has different return types for sets/maps than for their multi cousins, and you must religiously avoid making any assumptions about how many copies of a value are stored in a container. With map and multimap, you must avoid using operator[], because that member function exists only for map.
Face the truth: it’s not worth it. The different containers are different, and they have strengths and weaknesses that vary in significant ways. They’re not designed to be interchangeable, and there’s little you can do to paper that over. If you try, you’re merely tempting e, and fate doesn’t like to be tempted. Still, the day will dawn when you’ll realize that a container choice you made was, er, suboptimal, and you’ll need to use a different container type. You now know that when you change container types, you’ll not only need to fix whatever problems your compilers diagnose, you’ll also need to examine all the code using the container to see what needs to be changed in light of the new container’s performance characteristics and rules for invalidation of iterators, pointers, and references.
If you switch from a vector to something else, you’ll also have to make sure you’re no longer relying on vector’s C-compatible memory layout, and if you switch to a vector, you’ll have to ensure that you’re not using it to store bools.
Given the inevitability of having to change container types from time to time, you can facilitate such changes in the usual manner: by encapsulating, encapsulating, encapsulating. One of the easiest ways to do this is through the liberal use of typedefs for container and iterator types. Hence, instead of writing this,
class Widget { ... };
vector vw;
Widget bestWidget;
... // give bestWidget a value
vector::iterator i = // find a Widget with the
find(vw.begin(), vw.end(), bestWidget); // same value as bestWidget

write this:

class Widget { ... };
typedef vector WidgetContainer;
typedef WidgetContainer::iterator WCIterator;
WidgetContainer vw;
Widget bestWidget;
...
WCIterator i = find(vw.begin(), vw.end(), bestWidget);

This makes it a lot easier to change container types, something that’s especially convenient if the change in question is simply to add a custom allocator. (Such a change doesn’t affect the rules for iterator/pointer/reference invalidation.)

class Widget { ... };
template // see Item 10 for why this
SpecialAllocator { ... }; // needs to be a template
typedef vector > WidgetContainer;
typedef WidgetContainer::iterator WCIterator;
WidgetContainer vw; // still works
Widget bestWidget;
...
WCIterator i = find(vw.begin(), vw.end(), bestWidget); // still works

If the encapsulating aspects of typedefs mean nothing to you, you’re still likely to appreciate the work they can save. For example, if you have an object of type

mapvector::iterator,
CIStringCompare> // CIStringCompare is “case-
// insensitive string compare;”
// Item 19 describes it
and you want to walk through the map using const_iterators, do you really want to spell out

map::iterator, CIStringCompare>::const_iterator

more than once? Once you’ve used the STL a little while, you’ll realize that typedefs are your friends. A typedef is just a synonym for some other type, so the encapsulation it affords is purely lexical. A typedef doesn’t prevent a client from doing (or depending on) anything they couldn’t already do (or depend on). You need bigger ammunition if you want to limit client exposure to the container choices you’ve made. You need classes.
To limit the code that may require modification if you replace one container type with another, hide the container in a class, and limit the amount of container-specific information visible through the class interface. For example, if you need to create a customer list, don’t use a list directly. Instead, create a CustomerList class, and hide a list in its private section:

class CustomerList {
private:
typedef list CustomerContainer;
typedef CustomerContainer::iterator CCIterator;
CustomerContainer customers;
public: // limit the amount of list-specific
... // information visible through
}; // this interface

At first, this may seem silly. After all a customer list is a list, right? Well, maybe. Later you may discover that you don’t need to insert or erase customers from the middle of the list as often as you’d anticipated, but you do need to quickly identify the top 20% of your customers — a task tailor-made for the nth_element algorithm (see Item 31). But nth_element requires random access iterators. It won’t work with a list. In that case, your customer “list” might be better implemented as a vector or a deque.

When you consider this kind of change, you still have to check every CustomerList member function and every friend to see how they’ll be affected (in terms of performance and iterator/pointer/reference invalidation, etc.), but if you’ve done a good job of encapsulating Cus-tomerList’s implementation details, the impact on CustomerList clients should be small. You can’t write container-independent code, but they might be able to.


 


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10748419/viewspace-997150/,如需轉載,請註明出處,否則將追究法律責任。

相關文章