"Application Server Sizing Guidelines", v2.1
*********************************************************************************************
The following document is NOT an endorsed NETSCAPE, SUN, or AOL publication. It is being provided with no guarantees or promises as to the validity or accuracy of the content within. The document is provided for educational purposes under the understanding that the information is considered confidential and not for public distribution.
This is provided AS IS. Please forward all corrections, comments, etc directly to erniep@netscape.com
*********************************************************************************************
Thanks to those of you that caught my math mistake in one of the examples below. In my eagerness to get the new examples up, I didn't check my work. For those of you that found it, I failed to multiply 350 by 6.
Important numbers
to know . . .
TPS - transactions per second (that involve NAS in any way, or each execution of a given AppLogic represents a transaction)
Latency - time from reqstart and reqexit in KXS, or the amount of time that a request takes from start to finish in NAS
Buffer - this is the factor that allows proper sizing to never work properly sized machines to more than 80% utilization, thereby leaving a little breathing room for the machine.
Threads per transaction - number of threads used per physical transaction request, usually one, unless multiple objects are called from a primary object, each on their own thread.
TPT - (Total number of threads called when all objects under normal load are called) / (Total number of objects called)
NEW!!! Cluster Size - considering that iAS can have larger clusters than just two machines, this is the factor that allows for accurate management of the workload at peak, and leaves the right amount of resources in the cluster in the event of a system or multisystem failure. The following gets a little thick, but hang with it.
GCS
- Gross cluster size
MF
- Maximum hardware failure, like in a given cluster, how many machines
can go down in a worst case scenario
Example: Client has trading site with 100 tps at peak, and 2 seconds latency under that load. They accept that the reasonable buffer is 80% of system resources dedicated to application server executions. They want a big bandwidth backend and they want to be able to survive multiple failures. They want to size the backend at 2 application servers in a cluster (GCS), but sized to survive the real time failure of up to 1 machine (MF). In conclusion, 1 server has to be sized to handle to workload of 2 machines in the event of a worst case machine failure. They have 200 objects invoked under normal site use, and those objects can use a total of 240 threads.
((Buffer %) * ((GCS - MF) / GCS)) to determine peak application server sized load per each machine of five, assuming no failures
80% * ((2 - 1) / 2)
80% * (1 / 2)
80% * .5
40%
application server load per machine
(((TPS * Latency) / ((GCS - MF) / GCS)) / (Buffer% / 100%)) * (TPT)
(((100 t/sec * 2 sec) / ((2 - 1) / 2)) / (80% / 100%)) * 1.2
((200 / ( 1 / 2)) / .8) * 1.2
((200 / .5) / .8) * 1.2
(400 / .8) * 1.2
500 * 1.2
600
transactions - This is the concurrent volume that each application server
has to handle.
Therefore . . .
The actual formula for sizing
for App Servers
(((Trans/second
* Latency/sec) / ((Gross Cluster Size - Maximum Hardware Failure) / Gross
Cluster Size)) / (Buffer% / 100%)) * (Threads/Transaction)
Remember that all numbers
must be taken from peak load or worst case scenarios. If peak load is 20X
greater than average load, clearly average will generate incorrect numbers
for anything other than sustaining the site under "average" conditions.
The following is an example from my Art of NAS training:
Assumptions
-----------
Brokerage firm
where the difference between average and peak loads are extreme. Assumed
some sort of a backend database... properly sized. I did not do any analysis
of the network. However, there are numbers below so that one could
analyze the network and make recommendations.
Method
Used for Sizing
----------------------
The Ernie Park
"rule of thumb algorithm for NAS/WEB Sever sizing", with example by Mike
Grove from Sun. Note, this should be used to get the initial config
together. Actual testing should be done to ensure a proper architecture
for the final config.
Variables
----------
Total Users
= 1000 (
Remember, total users is a marketing term, and has no real value for performance
or sizing)
------------------------------------------
Concurrent Users = 500 (This number is also fluff, unless there are large session objects and very high transactional loads. This influences a subjective variable for fine tuning)
Click interval = Amount of time it takes for a user of a site to click each subsequent page after the first
Determine TPS
is unknown
Peak Concurrent
Users * Click interval(reduced to seconds) = ETPS, or Estimated TPS
------------------------------------------
We assume that each user clicks once every 10 seconds, therefore it takes either 10 users to create 1 click, or one user creates 1/10th of a click each second. As a result . . .
TPS = (.1
click per second interval) x (Concurrent Users)
50tps = (.1tps
x 500)
Note: The
number of clicks per 10 second interval is arbitrary. I normalize
to clicks/second.
--------------------------------------------
This is to determine the real numbers for calculation purposes. Where the Estimated TPS may have represented an average load condition, this is the worst case condition. If the previous ETPS or TPS was based on worst case, skip this or reverse calculate for ATPS, or Average TPS.
Average TPS
Average Concurrent Users = 40
Each user clicks once every 20 seconds
TPS = (.05 click per second interval) x (Average Concurrent Users)
TPS = (.05 ) x ( 40 ) = 2
Peak TPS = ATPS
x (volatility load factor)
To determine Volatility Load Factor, divide Estimated or Actual Peak TPS by Average TPS.
Peak TPS/ATPS = VLF
100tps / 2tps =50
Note: This is where things can go skyward. The multiplier, in this case 50 represents a VERY volatile site, where peak loads can consume LOTS of resources. In other words, this site must be 50 times larger under peak load than under average load to run, or under average load, very little resources will be used, or approximately 2%, versus the 80% of system resources that would be required to run under severe load. You must understand the site you are sizing. For instance, for a site that is intranet based, and loads do not fluctuate radically, 2-5 would be a more likely multiplier. An online brokerage firm represents a highly volatile environment, where at the beginning and end of the trading day represents a 50x load on this system versus during the middle hours. One must architect for the high volume hours. At high volume hours the system should be operating at approx. 40% to 80% utilization. During off peak hours the system might be at 2 - 10%. We need to ensure that peak hours are the driving requirement for the architecture.
--------------------------------------------
Size of Transaction = 6k
Note: I
use this to size the network throughput
--------------------------------------------
Peak Throughput = (size of transaction) x (Peak TPS)
300,000 Bytes/s
= 6,000
bytes x 50TPS
This
is good for sizing the network throughput
--------------------------------------------
Avg Throughput = (size of transaction) x (ATPS)
12,000 Bytes/s = 6,000 x 2
Note:
This is good for sizing the network thruput
--------------------------------------------
Latency or thread buffer factor = 2 seconds
Mike Grove, Sun Microsystems . . .
"Rather than the this representing
the time delay to display a page, which Ernie explained to me, such that
I actually understood... no easy task, ;) I like to think of
this as the multipling factor to use representing the number of reserve
threads per second. That means if I only need to service 4,167 threads
in a second, and I don't need to service any others, then that's all I
need. However, transactions are coming in continuously, so I want to build
in a buffer so that when the next 4,167 transactions come in I have enough
threads available to service them versus letting them queue up. So
this number represents the amount of buffer needed to handle "X" seconds
of traffic before queuing starts. So what is needed in order to come
up with a real factor here is to understand how long an average thread
takes to execute before that thread goes back in the thread pool.
In this example, I am building a 3 second reserve in the thread pool."
-------------------------------------------------
Client has trading site with . . .
100 tps
at peak
2 seconds
latency under that load
80% buffer for application server executions
They want a big bandwidth backend and they want to be able to survive multiple failures. They want to size the backend at 5 application servers in a cluster (GCS), but sized to survive the real time failure of up to 2 machines (MF).
In conclusion, 3 servers have to be sized to handle to workload of 5 machines in the event of a worst case multimachine failure.
They have 200 objects invoked under normal site use, and those objects can use a total of 240 threads.
((Buffer %) * ((GCS - MF) / GCS)) to determine peak application server sized load per each machine of five, assuming no failures
80% * ((5 - 2) / 5)
80% * (3 / 5)
80% * .6
48%
application server load per machine
(((TPS * Latency) / ((GCS - MF) / GCS)) / (Buffer% / 100%)) * (TPT)
(((100 t/sec * 2 sec) / ((5 - 2) / 5)) / (80% / 100%)) * 1.2
((200 / ( 3 / 5)) / .8) * 1.2
((200 / .6) / .8) * 1.2
(334 / .8) * 1.2
418 * 1.2
502
transactions - This is the concurrent volume that each application server
has to handle.
--------------------------------------------------
iAS Sizing and Configuration Guidelines
Java Servers
1 - 4 Servers per CPU
Threads
per server, min - max settings
1GB
to 1CPU NAS
.5GB to 1CPU Web Svr
NAS
CPU's to Web Svr CPU's = 1:1
-----------------------------------------------------------------------------------
This
client uses mostly stored procedure calls to get data, and there is a small
dependancy on LDAP.
Based
on 2 second latency, each CPU can run 2 Java Servers (VM's). Threading
will be set to 48 - 64 based on the chart.
CPU's = ((transactions) /(max threads per VM * VMs per CPU)) * (Gross Cluster Size)
CPU's = ( 502 / (64 * 2)) * 5
CPU's = (502 / 128) * 5
CPU's = 4 * 5
CPU's
= 20
And
finally, this is a real example of a discount brokerage firm, with only
the name omitted
Client has trading site with . . .
350 tps
at peak
6 seconds
latency under that load
80% buffer for application server executions
They want a big bandwidth backend and they want to be able to survive multiple failures. They want to size the backend at 4 application servers in a cluster (GCS), but sized to survive the real time failure of up to 1 machine (MF).
In conclusion, 3 servers have to be sized to handle to workload of 4 machines in the event of a worst case machine failure.
They have 200 objects invoked under normal site use, and those objects can use a total of 400 threads.
((Buffer %) * ((GCS - MF) / GCS)) to determine peak application server sized load per each machine of four, assuming no failures
80% * ((4 - 1) / 4)
80% * (3 / 4)
80% * .75
60%
application server load per machine
(((TPS * Latency) / ((GCS - MF) / GCS)) / (Buffer% / 100%)) * (TPT)
(((350 t/sec * 6 sec) / ((4 - 1) / 4)) / (80% / 100%)) * 2.0
((2100t / ( 3 / 4)) / .8) * 2.0
((2100t / .75) / .8) * 2.0
(2800t / .8) * 2.0
3500t * 2.0
7000t
( or transactions) - This is the concurrent volume that each application
server has to handle.
This client uses mostly stored procedure calls to get data from a load sensitivemainframe, and there is a high dependancy on a proprietary repository of realtime data for quotes.
Based on 6 second latency, each CPU can run 1 Java Server (VM's). Threading will be set to 24 - 32 based on the chart.
CPU's = ((transactions) /(max threads per VM * VMs per CPU)) * (Gross Cluster Size)
CPU's = ( 7000t / (32 * 1)) * 4
CPU's = (7000t / 32) * 4
CPU's = 219 cpu's per machine * 4 machines
CPU's
= 876
In this case, we would have to divide the load per cluster to get the
CPU's per machine to reasonable levels. The main reason that this number
is so high is that the latency is very long.
Mike Grove, Sun Microsystems . . .
"Note: This was the recommended
number of engines and threads to run on a single CPU (UltraSparc 400Mhz).
This could also change based on the underlying JVM, OS, or application.
Ernie found that this number was good enough to get a baseline configuration
together that would be in the ballpark."
-----------------------------------------------
Mike Grove, Sun Microsystems . . .
"Note: You may think that 148+ CPU's to handle 35,000 concurrent users at peak is excessive. I did, until Ernie pointed out that using an example like a brokerage firm requires a large "volatility load factor". If this was an intranet with little difference between peak and average loads, the "volatility load factor" would be much lower .ie. (2-4) range. This would reduce the number of needed CPU's significantly.
Memory usage should be specified
at 1MB per CPU to begin with. This could change based on actual site testing.
If process swapping is occurring, or other memory related trouble areas
pop up, then more memory may be needed. Again, this is to get in
the ballpark with a configuration, not the final config."
WEB Server Rule of Thumb:
For every NAS CPU there should be a WEB Server CPU.
Memory on the WEB Server CPU can be 1/2 the memory used on each NAS CPU.
Notice that for single machine deployments,
they are still sized over capacity. This allows them to operate at 80%
utilization at calculated peak load instead of 100%. This is important
to explain to customers. Also, it will sell more hardware and software,
since clients don't really know what peak load is, and hardware is cheaper
than finding out that peak load is more than the current hardware can handle.