Server Strategies -- Programming High Performance WinSock Server (轉)
In this section, we'll take a look at several strategies for handling resources depending on the nature of the server. Also, the more control you have over the design of the client and server allows you to design both accordingly to avoid the limitations and bottlenecks discussed previously. Again, there is no foolproof method that will work 100 percent in all situations. Servers can be divd roughly into two categories: high throughput and high connections. A high throughput server is more concerned with pushing data on a small number of connections. Of course, the meaning of the phrase “small number of connections” is relative to the amount of resources available on the server. A high connection server is more concerned with handling a large number of connections and is not attempting to push large data amounts.
In the next two sections, we'll discuss both high throughput and high connection server strategies. After that, we'll look at performance numbers gathered from the server samples provided on the companion CD.
High ThroughputAn server is an example of a high throughput server. It is concerned with delivering bulk content. In this case, the server is concerned with processing each connection to minimize the amount of time required to traner the data. To do so, the server must limit the number of concurrent connections because the greater the simultaneous connections, the lower the throughput will be on each connection. An example would be an FTP server that refuses a connection because it is too busy.
The goal for this strategy is I/O. The server should keep enough receives or sends posted to maximize throughput. Because each overlapped I/O requires memory to be locked as well as a small portion of non-paged pool for each IRP associated with the operation, it is important to limit I/O to a small set of connections. It is possible for the server to continually accept connections and have a relatively high number of established connections, but I/O must be limited to a smaller set.
In this case, the server may post a number of sends or receives on a subset of the established clients. For example, the server could handle client connections in a first-in, first-out manner and post a number of overlapped sends and/or receives on the first 100 connections. After those clients are handled, the server can move on the next set of clients in the queue. In this model, the number of outstanding send and receive operations are limited to a smaller set of connections. This prevents the server from blindly posting I/O operations on every connection, which could quickly exhaust the server's resources.
The server should take care to monitor the number of operations outstanding on each connection so it may prevent malicious clients from attacking it. For example, a server designed to receive data from a client, process it, and send some sort of response should keep track of how many sends are outstanding. If the client is simply flooding the server with data but not posting any receives, the server may end up posting dozens of overlapped sends that will never complete. In this case, once the server finds that there are too many outstanding operations, it can close the connection.
Maximizing ConnectionsMaximizing the number of concurrent client connections is the more difficult of the two strategies. Handling the I/O on each connection becomes difficult. A server cannot simply post one or more sends or receives on each connection because the amount of memory (both in terms of locked pages and non-paged pool) is great. In this scenario, the server is interested in handling many connections at the expense of throughput on each connection. An example of this would be an instant messenger server. The server would handle many thousands of connections but would need to send or receive only a small number of bytes at a time.
For this strategy, the server does not necessarily want to post an overlapped receive on each connection because this would involve locking many pages for each of the receive buffers. Instead, the server can post an overlapped zero-byte receive. Once the receive completes, the server would perfoa non-blocking receive until WSAEWOUDLBLOCK is returned. This allows the server to immediately receive all buffered data received on that connection. Because this model is geared toward clients that send data intettently, it minimizes the number of locked pages but still allows processing of data on each connection.
Performance NumbersThis section covers performance numbers from the different servers provided in Chapters 5 and 6. The various servers tested are those using blocking sockets, non-blocking with , WSAAsyncSelect, WSAEventSelect, overlapped I/O with events, and overlapped I/O with completion ports. Table 6-3 summarizes the results of these tests. For each I/O model, there are a couple of entries. The first entry is where 7000 connections were attempted from three clients. For all of these tests, the server is an echo server. That is, for each connection that is accepted, data is received and sent back to the client. The first entry for each I/O model represents a high-throughput server where the client sends data as fast as possible to the server. Each of the sample servers does not limit the number of concurrent connections. The remaining entries represent the connections when the clients limit the rate in which they send data so as to not overrun the bandwidth available on the network. The second entry for each I/O model represents 12,000 connections from the client, which is rate limiting the data sent. If the server was able to handle the majority of the 12,000 connections, then the third entry is the maximum number of clients the server was able to handle.
As we mentioned, the servers used are those provided from Chapter 5 except for the I/O completion port server, which is a slightly modified version of the Chapter 5 completion port server except that it limits the number of outstanding operations. This completion port server limits the number of outstanding send operations to 200 and posts just a single receive on each client connection. The client used in this test is the I/O completion port client from Chapter 5. Connections were established in blocks of 1000 clients by specifying the ‘-c 1000' option on the client. The two x86-based clients initiated a maximum of 12,000 connections and the Itanium system was used to establish the remaining clients in blocks of 4000. In the tests that were rate limited, each client block was limited to 200,000 bytes per second (using the ‘-r 200000' switch). So the average send throughput for that entire block of clients was limited to 200,000 bytes per second (not that each client was limited to this amount).
I/O Model
Attempted/Connected
Memory Used (KB)
Non-Paged Pool
Usage
Threads
Throughput (Send/ Receive Bytes Per Second)
Blocking
7000/ 1008
25,632
36,121
10–60%
2016
2,198,148/ 2,198,148
12,000/ 1008
25,408
36,352
5– 40%
2016
404,227/ 402,227
Non- blocking
7000/ 4011
4208
135,123
95–100%*
1
0/0
12,000/ 5779
5224
156,260
95–100%*
1
0/0
WSA- Async Select
7000/ 1956
3640
38,246
75–85%
3
1,610,204/ 1,637,819
12,000/ 4077
4884
42,992
90–100%
3
652,902/ 652,902
WSA- Event Select
7000/ 6999
10,502
36,402
65–85%
113
4,921,350/ 5,186,297
12,000/ 11,080
19,214
39,040
50–60%
192
3,217,493/ 3,217,493
46,000/ 45,933
37,392
121,624
80–90%
791
3,851,059/ 3,851,059
Over- lapped (events)
7000/ 5558
21,844
34,944
65–85%
66
5,024,723/ 4,095,644
12,000/12,000
60,576
48,060
35–45%
195
1,803,878/ 1,803,878
49,000/48,997
241,208
155,480
85–95%
792
3,865,152/ 3,834,511
Over- lapped (comple- tion port)
7000/ 7000
36,160
31,128
40–50%
2
6,282,473/ 3,893,507
12,000/12,000
59,256
38,862
40–50%
2
5,027,914/ 5,027,095
50,000/49,997
242,272
148,192
55–65%
2
4,326,946/ 4,326,496
The server was a Pentium 4 1.7 GHz Xeon with 768 MB memory. Clients were established from three machines: Pentium 2 233MHz with 128 MB memory, Pentium 2 350 MHz with 128 MB memory, and an Itanium 733 MHz with 1 GB memory. The test network was a 100 MB isolated hub. All of the machines tested had installed.
The blocking model is the poorest performing of all the models. The blocking server spawns two threads for each client connection: one for sending data and one for receiving it. In both test cases, the server was unable to handle a fraction of the connections because it hit a system resource limit on creating threads. Thus the CreateThread call was failing with ERROR_NOT_ENOUGH_MEMORY. The remaining client connections failed with WSAECONNREFUSED.
The non-blocking model faired only somewhat better. It was able to accept more connections but ran into a CPU limitation. The non-blocking server puts all the connected sockets into an FD_SET, which is passed into select. When select completes, the server uses the FD_ISSET macro to search to determine if that socket is signaled. This becomes inefficient because the number of connections increases. Just to determine if a socket is signaled, a linear search through the array is required! To partially alleviate this problem, the server can be redesigned so that it iteratively steps through the FD_SETs returned from select. The only issue is that the server then needs to be able to quickly find the SOCKET_INFO structure associated with that socket handle. In this case, the server can provide a more sophisticated cataloging mechanism, such as a hash tree, which allows quicker lookups. Also note that the non-paged pool usage is extremely high. This is because both AFD and TCP are buffering data on the client connections because the server is unable to read the data fast enough (as indicated by the zero-byte throughput) as indicated by the high CPU usage.
The WSAAsyncSelect model is acceptable for a small number of clients but does not scale well because the overhead of the message l quickly bogs down its capability to process messages fast enough. In both tests, the server is able to handle only about a third of the connections made. The clients receive many WSAECONNREFUSED errors indicating that the server cannot handle the FD_ACCEPT messages quickly enough so the listen backlog is not exhausted. However, even for those connections accepted, you will notice that the average throughput is rather low (even in the case of the rate limited clients).
Surprisingly, the WSAEventSelect model performed very well. In all the tests, the server was, for the most part, able to handle all the incoming connections while obtaining very good data throughput. The drawback to this model is the overhead required to manage the thread pool for new connections. Because each thread can wait on only 64 events, when new connections are established new threads have to be created to handle them. Also, in the last test case in which more than 45,000 connections were established, the machine became very sluggish. This was most likely due to the great number of threads created to service the many connections. The overhead for switching between the 791 threads becomes significant. The server reached a point at which it was unable to accept any more connections due to numerous WSAENOBUFS errors. In addition, the client application reached its limitation and was unable to sustain the already established connections (we'll discuss this in detail later).
The overlapped I/O with events model is similar to the WSAEventSelect in terms of scalability. Both models rely on thread pools for event notification, and both reach a limit at which the thread switching overhead becomes a factor in how well it handles client communication. The performance numbers for this model almost exactly mirror that of WSAEventSelect. It does surprisingly well until the number of threads increases.
The last entry is for overlapped I/O with completion ports, which is the best performing of all the I/O models. The memory usage (both user and non-paged pool) and accepted clients are similar to both the overlapped I/O with events and WSAEventSelect model. However, the real difference is in CPU usage. The completion port model used only around 60 percent of the CPU, but the other two models required substantially more horsepower to maintain the same number of connections. Another significant difference is that the completion port model also allowed for slightly better throughput.
While carrying out these tests, it became apparent that there was a limitation introduced due to the nature of the data interaction between client and server. The server is designed to be an echo server such that all data received from the client was sent back. Also, each client continually sends data (even if it's at a lower rate) to the server. This results in data always pending on the server's socket (either in the TCP buffers or in AFD's per-socket buffers, which are all non-paged pool). For the three well-performing models, only a single receive is performed at a time; however, this means that for the majority of the time, there is still data pending. It is possible to modify the server to perform a non-blocking receive once data is indicated on the connection. This would drain the data buffered on the machine. The drawback to this approach in this instance is that the client is constantly sending and it is possible that the non-blocking receive could return a great deal of data, which would lead to starvation of other connections (as the thread or completion thread would not be able to handle other events or completion notices). Typically, calling a non-blocking receive until WSAEWOULLOCK works on connections where data is transmitted in intervals and not in a continuous manner.
From these performance numbers it is easily deduced that WSAEventSelect and overlapped I/O offer the best performance. For the two event based models, setting up a thread pool for handling event notification is cumbersome but still allows for lent performance for a moderately stressed server. Once the connections increase and the number of threads increases, then scalability becomes an issue as more CPU is consumed for context switching between threads. The completion port model still offers the ultimate scalability because CPU usage is less of a factor as the number of clients increases.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10752019/viewspace-962926/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- High Availability (HA) in SQL ServerAISQLServer
- Performance and High-Availability OptionsORMAI
- Plan for caching and performance in SharePoint Server 2013ORMServer
- HiSpider(Hitask) is a fast and high performance spider with high speedIDEASTORM
- 快速搭建Time Server與NIS Server.(轉)Server
- How to tune SharePoint 2010 Server for better performance?ServerORM
- [轉] SQL Server 原理SQLServer
- SetUp a Secure Ubuntu Server[轉]UbuntuServer
- NFS Server設定(轉)NFSServer
- 架設 DHCP Server(轉)Server
- APACHE WEB SERVER管理(轉)ApacheWebServer
- sql server 行列轉換SQLServer
- 《The Rust Programming language》程式碼練習(part 3 簡單web server)RustWebServer
- An Overview of High Performance Computing and Responsibly Reckless AlgorithmsViewORMGo
- sql server行列轉換案例SQLServer
- sql server型別轉換SQLServer型別
- 【轉載】SQL Server 維護SQLServer
- Perl 連線 SQL Server (轉)SQLServer
- SQL SERVER效能優化(轉)SQLServer優化
- 完善的Socket Server程式 (轉)Server
- 使用SQLDMO控制 SQL Server (轉)SQLServer
- 配置Apache Server + Tomcat (轉)ApacheServerTomcat
- SQL Server日期計算 (轉)SQLServer
- SQL Server 轉儲的介紹SQLServer
- 豎錶轉橫表(SQL SERVER)SQLServer
- sql server日期時間轉字串SQLServer字串
- 利用SQL Server發郵件 (轉)SQLServer
- SQL Server的有效安裝 (轉)SQLServer
- SQL Server效能分析引數 (轉)SQLServer
- Informix Dynamic Server的安裝(轉)ORMServer
- How to Rename a Server That Is Not a Data Store ServerServer
- MSSQL Server Rename Server_nameSQLServer
- Oracle 19c Concepts(08):Server-Side Programming: PL/SQL and JavaOracleServerIDESQLJava
- Bug 14641937 : HIGH WAIT TIME ON "GCS DRM FREEZE IN ENTER SERVER MODE"AIGCServer
- [zt] 使用SQL Server中的Linked Server及Remote ServerSQLServerREM
- 小寫轉大寫金額[SQL SERVER] (轉)SQLServer
- Winsock程式設計框架 (轉)程式設計框架
- Extreme Programming (轉)REM