to the Postfix-owned data_directory. File: global/data_redirect.c.
Lots of pathname fixes in the examples of TLS_README and
- postconf(5); -lm library screw-up in 8qmgr/Makefile.in.
+ postconf(5); -lm library screw-up in queue manager Makefiles.
+
+20071207
+
+ Cleanup: pathname fixes in documentation; unnecessary queue
+ scan in the queue manager rate limiter; inverse square root
+ feedback in the queue manager concurrency scheduler. Files:
+ mantools/postlink, proto/TLS_README.html, *qmgr/qmgr_queue.c.
+
+ All changes up to this point should be ready for Postfix 2.5.
+
+ Documentation: updated nqgmr preemptive scheduler documentation
+ by Patrik Rak. File: proto/SCHEDULER_README.html.
and removes mail from the queue after the last delivery attempt. There are two
major classes of mechanisms that control the operation of the queue manager.
-The first class of mechanisms is concerned with the number of concurrent
-deliveries to a specific destination, including decisions on when to suspend
-deliveries after persistent failures:
+ * Concurrency scheduling is concerned with the number of concurrent
+ deliveries to a specific destination, including decisions on when to
+ suspend deliveries after persistent failures.
+ * Preemptive scheduling is concerned with the selection of email messages and
+ recipients for a given destination.
+ * Credits. This document would not be complete without.
- * Concurrency scheduling
-
- o Summary of the Postfix 2.5 concurrency feedback algorithm
- o Summary of the Postfix 2.5 "dead destination" detection algorithm
- o Pseudocode for the Postfix 2.5 concurrency scheduler
- o Results for delivery to concurrency limited servers
- o Discussion of concurrency limited server results
- o Limitations of less-than-1 per delivery feedback
- o Concurrency configuration parameters
-
-The second class of mechanisms is concerned with the selection of what mail to
-deliver to a given destination:
-
- * Preemptive scheduling
+C\bCo\bon\bnc\bcu\bur\brr\bre\ben\bnc\bcy\by s\bsc\bch\bhe\bed\bdu\bul\bli\bin\bng\bg
- o Why the non-preemptive Postfix queue manager was replaced
- o How the non-preemptive queue manager scheduler works
+The following sections document the Postfix 2.5 concurrency scheduler, after a
+discussion of the limitations of the existing concurrency scheduler. This is
+followed by results of medium-concurrency experiments, and a discussion of
+trade-offs between performance and robustness.
-And this document would not be complete without:
+The material is organized as follows:
- * Credits
+ * Drawbacks of the existing concurrency scheduler
+ * Summary of the Postfix 2.5 concurrency feedback algorithm
+ * Summary of the Postfix 2.5 "dead destination" detection algorithm
+ * Pseudocode for the Postfix 2.5 concurrency scheduler
+ * Results for delivery to concurrency limited servers
+ * Discussion of concurrency limited server results
+ * Limitations of less-than-1 per delivery feedback
+ * Concurrency configuration parameters
-C\bCo\bon\bnc\bcu\bur\brr\bre\ben\bnc\bcy\by s\bsc\bch\bhe\bed\bdu\bul\bli\bin\bng\bg
+D\bDr\bra\baw\bwb\bba\bac\bck\bks\bs o\bof\bf t\bth\bhe\be e\bex\bxi\bis\bst\bti\bin\bng\bg c\bco\bon\bnc\bcu\bur\brr\bre\ben\bnc\bcy\by s\bsc\bch\bhe\bed\bdu\bul\ble\ber\br
-This section documents the Postfix 2.5 concurrency scheduler. Prior Postfix
-versions used a simple but robust algorithm where the per-destination delivery
-concurrency was decremented by 1 after a delivery suffered connection or
-handshake failure, and was incremented by 1 otherwise. Of course the
-concurrency was never allowed to exceed the maximum per-destination concurrency
-limit. And when a destination's concurrency level dropped to zero, the
-destination was declared "dead" and delivery was suspended.
+From the start, Postfix has used a simple but robust algorithm where the per-
+destination delivery concurrency is decremented by 1 after a delivery suffered
+connection or handshake failure, and incremented by 1 otherwise. Of course the
+concurrency is never allowed to exceed the maximum per-destination concurrency
+limit. And when a destination's concurrency level drops to zero, the
+destination is declared "dead" and delivery is suspended.
-Drawbacks of the old +/-1 feedback per delivery are:
+Drawbacks of +/-1 concurrency feedback per delivery are:
* Overshoot due to exponential delivery concurrency growth with each pseudo-
- cohort(*). For example, with the default initial concurrency of 5,
- concurrency would proceed over time as (5-10-20).
+ cohort(*). This can be an issue with high-concurrency channels. For
+ example, with the default initial concurrency of 5, concurrency would
+ proceed over time as (5-10-20).
* Throttling down to zero concurrency after a single pseudo-cohort(*)
failure. This was especially an issue with low-concurrency channels where a
All results in the previous sections are based on the first delivery runs only;
they do not include any second etc. delivery attempts. The first two examples
-show that the feedback method matters little when concurrency is limited due to
-congestion. This is because the initial concurrency is already at the client's
-concurrency maximum, and because there is 10-100 times more positive than
-negative feedback. Under these conditions, the contribution from SMTP
-connection caching is negligible.
+show that the effect of feedback is negligible when concurrency is limited due
+to congestion. This is because the initial concurrency is already at the
+client's concurrency maximum, and because there is 10-100 times more positive
+than negative feedback. Under these conditions, it is no surprise that the
+contribution from SMTP connection caching is also negligible.
In the last example, the old +/-1 feedback per delivery will defer 50% of the
mail when confronted with an active (anvil-style) server concurrency limit,
P\bPr\bre\bee\bem\bmp\bpt\bti\biv\bve\be s\bsc\bch\bhe\bed\bdu\bul\bli\bin\bng\bg
-This is the beginning of documentation for a preemptive queue manager
-scheduling algorithm by Patrik Rak. For a long time, this code was made
-available under the name "nqmgr(8)" (new queue manager), as an optional module.
-As of Postfix 2.1 this is the default queue manager, which is always called
-"qmgr(8)". The old queue manager will for some time will be available under the
-name of "oqmgr(8)".
-
-W\bWh\bhy\by t\bth\bhe\be n\bno\bon\bn-\b-p\bpr\bre\bee\bem\bmp\bpt\bti\biv\bve\be P\bPo\bos\bst\btf\bfi\bix\bx q\bqu\bue\beu\bue\be m\bma\ban\bna\bag\bge\ber\br w\bwa\bas\bs r\bre\bep\bpl\bla\bac\bce\bed\bd
-
-The non-preemptive Postfix scheduler had several limitations due to unfortunate
-choices in its design.
-
- 1. Round-robin selection by destination for mail that is delivered via the
- same message delivery transport. The round-robin strategy was chosen with
- the intention to prevent a single (destination) site from using up too many
- mail delivery resources. However, that strategy penalized inbound mail on
- bi-directional gateways. The poor suffering inbound destination would be
- selected only 1/number-of-destinations of the time, even when it had more
- mail than other destinations, and thus mail could be delayed.
-
- Victor Duchovni found a workaround: use different message delivery
- transports, and thus avoid the starvation problem. The Patrik Rak scheduler
- solves this problem by using FIFO selection.
-
- 2. A second limitation of the old Postfix scheduler was that delivery of bulk
- mail would block all other deliveries, causing large delays. Patrik Rak's
- scheduler allows mail with fewer recipients to slip past bulk mail in an
- elegant manner.
-
-H\bHo\bow\bw t\bth\bhe\be n\bno\bon\bn-\b-p\bpr\bre\bee\bem\bmp\bpt\bti\biv\bve\be q\bqu\bue\beu\bue\be m\bma\ban\bna\bag\bge\ber\br s\bsc\bch\bhe\bed\bdu\bul\ble\ber\br w\bwo\bor\brk\bks\bs
-
-The following text is from Patrik Rak and should be read together with the
-postconf(5) manual that describes each configuration parameter in detail.
-
-From user's point of view, oqmgr(8) and qmgr(8) are both the same, except for
-how next message is chosen when delivery agent becomes available. You already
-know that oqmgr(8) uses round-robin by destination while qmgr(8) uses simple
-FIFO, except for some preemptive magic. The postconf(5) manual documents all
-the knobs the user can use to control this preemptive magic - there is nothing
-else to the preemption than the quite simple conditions described in there.
-
-As for programmer-level documentation, this will have to be extracted from all
-those emails we have exchanged with Wietse [rats! I hoped that Patrik would do
-the work for me -- Wietse] But I think there are no missing bits which we have
-not mentioned in our conversations.
-
-However, even from programmer's point of view, there is nothing more to add to
-the message scheduling idea itself. There are few things which make it look
-more complicated than it is, but the algorithm is the same as the user
-perceives it. The summary of the differences of the programmer's view from the
-user's view are:
-
- 1. Simplification of terms for users: The user knows about messages and
- recipients. The program itself works with jobs (one message is split among
- several jobs, one per each transport needed to deliver the message) and
- queue entries (each entry may group several recipients for same
- destination). Then there is the peer structure introduced by qmgr(8) which
- is simply per-job analog of the queue structure.
-
- 2. Dealing with concurrency limits: The actual implementation is complicated
- by the fact that the messages (resp. jobs) may not be delivered in the
- exactly scheduled order because of the concurrency limits. It is necessary
- to skip some "blocker" jobs when the concurrency limit is reached and get
- back to them again when the limit permits.
-
- 3. Dealing with resource limits: The actual implementation is complicated by
- the fact that not all recipients may be read in-core. Therefore each
- message has some recipients in-core and some may remain on-file. This means
- that a) the preemptive algorithm needs to work with recipient count
- estimates instead of exact counts, b) there is extra code which needs to
- manipulate the per-transport pool of recipients which may be read in-core
- at the same time, and c) there is extra code which needs to be able to read
- recipients into core in batches and which is triggered at appropriate
- moments.
-
- 4. Doing things efficiently: All important things I am aware of are done in
- the minimum time possible (either directly or at least when amortized
- complexity is used), but to choose which job is the best candidate for
- preempting the current job requires linear search of up to all transport
- jobs (the worst theoretical case - the reality is much better). As this is
- done every time the next queue entry to be delivered is about to be chosen,
- it seemed reasonable to add cache which minimizes the overhead. Maintenance
- of this candidate cache slightly obfuscates things.
-
-The points 2 and 3 are those which made the implementation (look) complicated
-and were the real coding work, but I believe that to understand the scheduling
-algorithm itself (which was the real thinking work) is fairly easy.
+This document attempts to describe the new queue manager and its preeemptive
+scheduler algorithm. Note that the document was originally written to describe
+the changes between the new queue manager (in this text referred to as nqmgr,
+the name it was known by before it became the default queue manager) and the
+old queue manager (referred to as oqmgr). This is why it refers to oqmgr every
+so often.
+
+This document is divided into sections as follows:
+
+ * The structures used by nqmgr
+ * What happens when nqmgr picks up the message - how it is assigned to
+ transports, jobs, peers, entries
+ * How does the entry selection work
+ * How does the preemption work - what messages may be preempted and how and
+ what messages are chosen to preempt them
+ * How destination concurrency limits affect the scheduling algorithm
+ * Dealing with memory resource limits
+
+T\bTh\bhe\be s\bst\btr\bru\buc\bct\btu\bur\bre\bes\bs u\bus\bse\bed\bd b\bby\by n\bnq\bqm\bmg\bgr\br
+
+Let's start by recapitulating the structures and terms used when referring to
+queue manager and how it operates. Many of these are partially described
+elsewhere, but it is nice to have a coherent overview in one place:
+
+ * Each message structure represents one mail message which Postfix is to
+ deliver. The message recipients specify to what destinations is the message
+ to be delivered and what transports are going to be used for the delivery.
+
+ * Each recipient entry groups a batch of recipients of one message which are
+ all going to be delivered to the same destination.
+
+ * Each transport structure groups everything what is going to be delivered by
+ delivery agents dedicated for that transport. Each transport maintains a
+ set of queues (describing the destinations it shall talk to) and jobs
+ (referencing the messages it shall deliver).
+
+ * Each transport queue (not to be confused with the on-disk active queue or
+ incoming queue) groups everything what is going be delivered to given
+ destination (aka nexthop) by its transport. Each queue belongs to one
+ transport, so each destination may be referred to by several queues, one
+ for each transport. Each queue maintains a list of all recipient entries
+ (batches of message recipients) which shall be delivered to given
+ destination (the todo list), and a list of recipient entries already being
+ delivered by the delivery agents (the busy list).
+
+ * Each queue corresponds to multiple peer structures. Each peer structure is
+ like the queue structure, belonging to one transport and referencing one
+ destination. The difference is that it lists only the recipient entries
+ which all originate from the same message, unlike the queue structure,
+ whose entries may originate from various messages. For messages with few
+ recipients, there is usually just one recipient entry for each destination,
+ resulting in one recipient entry per peer. But for large mailing list
+ messages the recipients may need to be split to multiple recipient entries,
+ in which case the peer structure may list many entries for single
+ destination.
+
+ * Each transport job groups everything it takes to deliver one message via
+ its transport. Each job represents one message within the context of the
+ transport. The job belongs to one transport and message, so each message
+ may have multiple jobs, one for each transport. The job groups all the peer
+ structures, which describe the destinations the job's message has to be
+ delivered to.
+
+The first four structures are common to both nqmgr and oqmgr, the latter two
+were introduced by nqmgr.
+
+These terms are used extensively in the text below, feel free to look up the
+description above anytime you'll feel you have lost a sense what is what.
+
+W\bWh\bha\bat\bt h\bha\bap\bpp\bpe\ben\bns\bs w\bwh\bhe\ben\bn n\bnq\bqm\bmg\bgr\br p\bpi\bic\bck\bks\bs u\bup\bp t\bth\bhe\be m\bme\bes\bss\bsa\bag\bge\be
+
+Whenever nqmgr moves a queue file into the active queue, the following happens:
+It reads all necessary information from the queue file as oqmgr does, and also
+reads as many recipients as possible - more on that later, for now let's just
+pretend it always reads all recipients.
+
+Then it resolves the recipients as oqmgr does, which means obtaining (address,
+nexthop, transport) triple for each recipient. For each triple, it finds the
+transport; if it does not exist yet, it instantiates it (unless it's dead).
+Within the transport, it finds the destination queue for given nexthop; if it
+does not exist yet, it instantiates it (unless it's dead). The triple is then
+bound to given destination queue. This happens in qmgr_resolve() and is
+basically the same as in oqmgr.
+
+Then for each triple which was bound to some queue (and thus transport), the
+program finds the job which represents the message within that transport's
+context; if it does not exist yet, it instantiates it. Within the job, it finds
+the peer which represents the bound destination queue within this jobs context;
+if it does not exist yet, it instantiates it. Finally, it stores the address
+from the resolved triple to the recipient entry which is appended to both the
+queue entry list and the peer entry list. The addresses for same nexthop are
+batched in the entries up to recipient_concurrency limit for that transport.
+This happens in qmgr_assign() and apart from that it operates with job and peer
+structures is basically the same as in oqmgr.
+
+When the job is instantiated, it is enqueued on the transport's job list based
+on the time its message was picked up by nqmgr. For first batch of recipients
+this means it is appended to the end of the job list, but the ordering of the
+job list by the enqueue time is important as we will see shortly.
+
+[Now you should have pretty good idea what is the state of the nqmgr after
+couple of messages was picked up, what is the relation between all those job,
+peer, queue and entry structures.]
+
+H\bHo\bow\bw d\bdo\boe\bes\bs t\bth\bhe\be e\ben\bnt\btr\bry\by s\bse\bel\ble\bec\bct\bti\bio\bon\bn w\bwo\bor\brk\bk
+
+Having prepared all those above mentioned structures, the task of the nqmgr's
+scheduler is to choose the recipient entries one at a time and pass them to the
+delivery agent for corresponding transport. Now how does this work?
+
+The first approximation of the new scheduling algorithm is like this:
+
+ foreach transport (round-robin-by-transport)
+ do
+ if transport busy continue
+ if transport process limit reached continue
+ foreach transport's job (in the order of the transport's job list)
+ do
+ foreach job's peer (round-robin-by-destination)
+ if peer->queue->concurrency < peer->queue->window
+ return next peer entry.
+ done
+ done
+ done
+
+Now what is the "order of the transport's job list"? As we know already, the
+job list is by default kept in the order the message was picked up by the
+nqmgr. So by default we get the top-level round-robin transport, and within
+each transport we get the FIFO message delivery. The round-robin of the peers
+by the destination is perhaps of little importance in most real-life cases
+(unless the recipient_concurrency limit is reached, in one job there is only
+one peer structure for each destination), but theoretically it makes sure that
+even within single jobs, destinations are treated fairly.
+
+[By now you should have a feeling you really know how the scheduler works,
+except for the preemption, under ideal conditions - that is, no recipient
+resource limits and no destination concurrency problems.]
+
+H\bHo\bow\bw d\bdo\boe\bes\bs t\bth\bhe\be p\bpr\bre\bee\bem\bmp\bpt\bti\bio\bon\bn w\bwo\bor\brk\bk
+
+As you might perhaps expect by now, the transport's job list does not remain
+sorted by the job's message enqueue time all the time. The most cool thing
+about nqmgr is not the simple FIFO delivery, but that it is able to slip mail
+with little recipients past the mailing-list bulk mail. This is what the job
+preemption is about - shuffling the jobs on the transport's job list to get the
+best message delivery rates. Now how is it achieved?
+
+First I have to tell you that there are in fact two job lists in each
+transport. One is the scheduler's job list, which the scheduler is free to play
+with, while the other one keeps the jobs always listed in the order of the
+enqueue time and is used for recipient pool management we will discuss later.
+For now, we will deal with the scheduler's job list only.
+
+So, we have the job list, which is first ordered by the time the job's messages
+were enqueued, oldest messages first, the most recently picked one at the end.
+For now, let's assume that there are no destination concurrency problems.
+Without preemption, we pick some entry of the first (oldest) job on the queue,
+assign it to delivery agent, pick another one from the same job, assign it
+again, and so on, until all the entries are used and the job is delivered. We
+would then move onto the next job and so on and on. Now how do we manage to
+sneak in some entries from the recently added jobs when the first job on the
+job list belongs to a message going to the mailing-list and has thousands of
+recipient entries?
+
+The nqmgr's answer is that we can artificially "inflate" the delivery time of
+that first job by some constant for free - it is basically the same trick you
+might remember as "accumulation of potential" from the amortized complexity
+lessons. For example, instead of delivering the entries of the first job on the
+job list every time an delivery agent becomes available, we can do it only
+every second time. If you view the moments the delivery agent becomes available
+on a timeline as "delivery slots", then instead of using every delivery slot
+for the first job, we can use only every other slot, and still the overall
+delivery efficiency of the first job remains the same. So the delivery 11112222
+becomes 1.1.1.1.2.2.2.2 (1 and 2 are the imaginary job numbers, . denotes the
+free slot). Now what do we do with free slots?
+
+As you might have guessed, we will use them for sneaking the mail with little
+recipients in. For example, if we have one four-recipient mail followed by four
+one recipients mail, the delivery sequence (that is, the sequence in which the
+jobs are assigned to the delivery slots) might look like this: 12131415. Hmm,
+fine for sneaking in the single recipient mail, but how do we sneak in the mail
+with more than one recipient? Say if we have one four-recipient mail followed
+by two two-recipient mails?
+
+The simple answer would be to use delivery sequence 12121313. But the problem
+is that this does not scale well. Imagine you have mail with thousand
+recipients followed by mail with hundred recipients. It is tempting to suggest
+the delivery sequence like 121212...., but alas! Imagine there arrives another
+mail with say ten recipients. But there are no free slots anymore, so it can't
+slip by, not even if it had just only one recipients. It will be stuck until
+the hundred-recipient mail is delivered, which really sucks.
+
+So, it becomes obvious that while the inflating the message to get free slots
+is great idea, one has to be really careful of how the free slots are assigned,
+otherwise one might corner himself. So, how does nqmgr really use the free
+slots?
+
+The key idea is that one does not have to generate the free slots in a uniform
+way. The delivery sequence 111...1 is no worse than 1.1.1.1, in fact, it is
+even better as some entries are in the first case selected earlier than in the
+second case, and none is selected later! So it is possible to first to
+"accumulate" the free delivery slots and then use them all at once. It is even
+possible to accumulate some, then use them, then accumulate some more and use
+them again, as in 11..1.1 .
+
+Let's get back to the one hundred recipient example. We now know that we could
+first accumulate one hundred free slots, and only after then to preempt the
+first job and sneak the one hundred recipient mail in. Applying the algorithm
+recursively, we see the hundred recipient job can accumulate ten free delivery
+slots, and then we could preempt it and sneak in the ten recipient mail... Wait
+wait wait! Could we? Aren't we overinflating the original one thousand
+recipient mail?
+
+Well, despite it looks so at the first glance, another trick will allow us to
+answer "no, we are not!". If we had said that we will inflate the delivery time
+twice at maximum, and then we consider every other slot as a free slot, then we
+would overinflate in case of the recursive preemption. BUT! The trick is that
+if we use only every n-th slot as a free slot for n>2, there is always some
+worst inflation factor which we can guarantee not to be breached, even if we
+apply the algorithm recursively. To be precise, if for every k>1 normally used
+slots we accumulate one free delivery slot, than the inflation factor is not
+worse than k/(k-1) no matter how many recursive preemptions happen. And it's
+not worse than (k+1)/k if only non-recursive preemption happens. Now, having
+got through the theory and the related math, let's see how nqmgr implements
+this.
+
+Each job has so called "available delivery slot" counter. Each transport has a
+transport_delivery_slot_cost parameter, which defaults to
+default_delivery_slot_cost parameter which is set to 5 by default. This is the
+k from the paragraph above. Each time k entries of the job are selected for
+delivery, this counter is incremented by one. Once there are some slots
+accumulated, job which requires no more than that amount of slots to be fully
+delivered can preempt this job.
+
+[Well, the truth is, the counter is incremented every time an entry is selected
+and it is divided by k when it is used. Or even more true, there is no
+division, the other side of the equation is multiplied by k. But for the
+understanding it's good enough to use the above approximation of the truth.]
+
+OK, so now we know the conditions which must be satisfied so one job can
+preempt another one. But what job gets preempted, how do we choose what job
+preempts it if there are several valid candidates, and when does all this
+exactly happen?
+
+The answer for the first part is simple. The job whose entry was selected the
+last time is so called current job. Normally, it is the first job on the
+scheduler's job list, but destination concurrency limits may change this as we
+will see later. It is always only the current job which may get preempted.
+
+Now for the second part. The current job has certain amount of recipient
+entries, and as such may accumulate at maximum some amount of available
+delivery slots. It might have already accumulated some, and perhaps even
+already used some when it was preempted before (remember a job can be preempted
+several times). In either case, we know how many are accumulated and how many
+are left to deliver, so we know how many it may yet accumulate at maximum.
+Every other job which may be delivered by less than that amount of slots is an
+valid candidate for preemption. How do we choose among them?
+
+The answer is - the one with maximum enqueue_time/recipient_entry_count. That
+is, the older the job is, the more we should try to deliver it in order to get
+best message delivery rates. These rates are of course subject to how many
+recipients the message has, therefore the division by the recipient (entry)
+count. No one shall be surprised that message with n recipients takes n times
+longer to deliver than message with one recipient.
+
+Now let's recap the previous two paragraphs. Isn't it too complicated? Why
+don't the candidates come only among the jobs which can be delivered within the
+amount of slots the current job already accumulated? Why do we need to estimate
+how much it has yet to accumulate? If you found out the answer, congratulate
+yourself. If we did it this simple way, we would always choose the candidate
+with least recipient entries. If there were enough single recipient mails
+coming in, they would always slip by the bulk mail as soon as possible, and the
+two and more recipients mail would never get a chance, no matter how long they
+have been sitting around in the job list.
+
+This candidate selection has interesting implication - that when we choose the
+best candidate for preemption (this is done in qmgr_choose_candidate()), it may
+happen that we may not use it for preemption immediately. This leads to an
+answer to the last part of the original question - when does the preemption
+happen?
+
+The preemption attempt happens every time next transport's recipient entry is
+to be chosen for delivery. To avoid needless overhead, the preemption is not
+attempted if the current job could never accumulate more than
+transport_minimum_delivery_slots (defaults to default_minimum_delivery_slots
+which defaults to 3). If there is already enough accumulated slots to preempt
+the current job by the chosen best candidate, it is done immediately. This
+basically means that the candidate is moved in front of the current job on the
+scheduler's job list and decreasing the accumulated slot counter by the amount
+used by the candidate. If there is not enough slots... well, I could say that
+nothing happens and the another preemption is attempted the next time. But
+that's not the complete truth.
+
+The truth is that it turns out that it is not really necessary to wait until
+the jobs counter accumulates all the delivery slots in advance. Say we have ten
+recipient mail followed by two two-recipient mails. If the preemption happened
+when enough delivery slot accumulate (assuming slot cost 2), the delivery
+sequence becomes 11112211113311. Now what would we get if we would wait only
+for 50% of the necessary slots to accumulate and we promise we would wait for
+the remaining 50% later, after the we get back to the preempted job? If we use
+such slot loan, the delivery sequence becomes 11221111331111. As we can see, it
+makes it no considerably worse for the delivery of the ten-recipient mail, but
+it allows the small messages to be delivered sooner.
+
+The concept of these slot loans is where the transport_delivery_slot_discount
+and transport_delivery_slot_loan come from (they default to
+default_delivery_slot_discount and default_delivery_slot_loan, whose values are
+by default 50 and 3, respectively). The discount (resp. loan) specifies how
+many percent (resp. how many slots) one "gets in advance", when the amount of
+slots required to deliver the best candidate is compared with the amount of
+slots the current slot had accumulated so far.
+
+And it pretty much concludes this chapter.
+
+[Now you should have a feeling that you pretty much understand the scheduler
+and the preemption, or at least that you will have it after you read the last
+chapter couple more times. You shall clearly see the job list and the
+preemption happening at its head, in ideal delivery conditions. The feeling of
+understanding shall last until you start wondering what happens if some of the
+jobs are blocked, which you might eventually figure out correctly from what had
+been said already. But I would be surprised if you mental image of the
+scheduler's functionality it is not completely shattered once you start
+wondering how it works when not all recipients may be read in-core. More on
+that later.]
+
+H\bHo\bow\bw d\bde\bes\bst\bti\bin\bna\bat\bti\bio\bon\bn c\bco\bon\bnc\bcu\bur\brr\bre\ben\bnc\bcy\by l\bli\bim\bmi\bit\bts\bs a\baf\bff\bfe\bec\bct\bt t\bth\bhe\be s\bsc\bch\bhe\bed\bdu\bul\bli\bin\bng\bg a\bal\blg\bgo\bor\bri\bit\bth\bhm\bm
+
+The nqmgr uses the same algorithm for destination concurrency control as oqmgr.
+Now what happens when the destination limits are reached and no more entries
+for that destination may be selected by the scheduler?
+
+From user's point of view it is all simple. If some of the peers of a job can't
+be selected, those peers are simply skipped by the entry selection algorithm
+(the pseudo-code described before) and only the selectable ones are used. If
+none of the peers may be selected, the job is declared a "blocker job". Blocker
+jobs are skipped by the entry selection algorithm and they are also excluded
+from the candidates for preemption of current job. Thus the scheduler
+effectively behaves as if the blocker jobs didn't exist on the job list at all.
+As soon as at least one of the peers of a blocker job becomes unblocked (that
+is, the delivery agent handling the delivery of the recipient entry for given
+destination successfully finishes), the job's blocker status is removed and the
+job again participates in all further scheduler actions normally.
+
+So the summary is that the user's don't really have to be concerned about the
+interaction of the destination limits and scheduling algorithm. It works well
+on its own and there are no knobs they would need to control it.
+
+From a programmer's point of view, the blocker jobs complicate the scheduler
+quite a lot. Without them, the jobs on the job list would be normally delivered
+in strict FIFO order. If the current job is preempted, the job preempting it is
+completely delivered unless it is preempted itself. Without blockers, the
+current job is thus always either the first job on the job list, or the top of
+the stack of jobs preempting the first job on the job list.
+
+The visualization of the job list and the preemption stack without blockers
+would be like this:
+
+ first job-> 1--2--3--5--6--8--... <- job list
+ on job list |
+ 4 <- preemption stack
+ |
+ current job-> 7
+
+In the example above we see that job 1 was preempted by job 4 and then job 4
+was preempted by job 7. After job 7 is completed, remaining entries of job 4
+are selected, and once they are all selected, job 1 continues.
+
+As we see, it's all very clean and straightforward. Now how does this change
+because of blockers?
+
+The answer is: a lot. Any job may become blocker job at any time, and also
+become normal job again at any time. This has several important implications:
+
+ 1. The jobs may be completed in arbitrary order. For example, in the example
+ above, if the current job 7 becomes blocked, the next job 4 may complete
+ before the job 7 becomes unblocked again. Or if both 7 and 4 are blocked,
+ then 1 is completed, then 7 becomes unblocked and is completed, then 2 is
+ completed and only after that 4 becomes unblocked and is completed... You
+ get the idea.
+
+ [Interesting side note: even when jobs are delivered out of order, from
+ single destination's point of view the jobs are still delivered in the
+ expected order (that is, FIFO unless there was some preemption involved).
+ This is because whenever a destination queue becomes unblocked (the
+ destination limit allows selection of more recipient entries for that
+ destination), all jobs which have peers for that destination are unblocked
+ at once.]
+
+ 2. The idea of the preemption stack at the head of the job list is gone. That
+ is, it must be possible to preempt any job on the job list. For example, if
+ the jobs 7, 4, 1 and 2 in the example above become all blocked, job 3
+ becomes the current job. And of course we do not want the preemption be
+ affected by the fact that there are some blocked jobs or not. Therefore, if
+ it turns out that job 3 might be preempted by job 6, the implementation
+ shall make it possible.
+
+ 3. The idea of the linear preemption stack itself is gone. It's no longer true
+ that one job is always preempted by only one job at one time (that is
+ directly preempted, not counting the recursively nested jobs). For example,
+ in the example above, job 1 is directly preempted by only job 4, and job 4
+ by job 7. Now assume job 7 becomes blocked, and job 4 is being delivered.
+ If it accumulates enough delivery slots, it is natural that it might be
+ preempted for example by job 8. Now job 4 is preempted by both job 7 AND
+ job 8 at the same time.
+
+Now combine the points 2) and 3) with point 1) again and you realize that the
+relations on the once linear job list became pretty complicated. If we extend
+the point 3) example: jobs 7 and 8 preempt job 4, now job 8 becomes blocked
+too, then job 4 completes. Tricky, huh?
+
+If I illustrate the relations after the above mentioned examples (but those in
+point 1)), the situation would look like this:
+
+ v- parent
+
+ adoptive parent -> 1--2--3--5--... <- "stack" level 0
+ | |
+ parent gone -> ? 6 <- "stack" level 1
+ / \
+ children -> 7 8 ^- child <- "stack" level 2
+
+ ^- siblings
+
+Now how does nqmgr deal with all these complicated relations?
+
+Well, it maintains them all as described, but fortunately, all these relations
+are necessary only for purposes of proper counting of available delivery slots.
+For purposes of ordering the jobs for entry selection, the original rule still
+applies: "the job preempting the current job is moved in front of the current
+job on the job list". So for entry selection purposes, the job relations remain
+as simple as this:
+
+ 7--8--1--2--6--3--5--.. <- scheduler's job list order
+
+The job list order and the preemption parent/child/siblings relations are
+maintained separately. And because the selection works only with the job list,
+you can happily forget about those complicated relations unless you want to
+study the nqmgr sources. In that case the text above might provide some helpful
+introduction to the problem domain. Otherwise I suggest you just forget about
+all this and stick with the user's point of view: the blocker jobs are simply
+ignored.
+
+[By now, you should have a feeling that there is more things going under the
+hood than you ever wanted to know. You decide that forgetting about this
+chapter is the best you can do for the sake of your mind's health and you
+basically stick with the idea how the scheduler works in ideal conditions, when
+there are no blockers, which is good enough.]
+
+D\bDe\bea\bal\bli\bin\bng\bg w\bwi\bit\bth\bh m\bme\bem\bmo\bor\bry\by r\bre\bes\bso\bou\bur\brc\bce\be l\bli\bim\bmi\bit\bts\bs
+
+When discussing the nqmgr scheduler, we have so far assumed that all recipients
+of all messages in the active queue are completely read into the memory. This
+is simply not true. There is an upper bound on the amount of memory the nqmgr
+may use, and therefore it must impose some limits on the information it may
+store in the memory at any given time.
+
+First of all, not all messages may be read in-core at once. At any time, only
+qmgr_message_active_limit messages may be read in-core at maximum. When read
+into memory, the messages are picked from the incoming and deferred message
+queues and moved to the active queue (incoming having priority), so if there is
+more than qmgr_message_active_limit messages queued in the active queue, the
+rest will have to wait until (some of) the messages in the active queue are
+completely delivered (or deferred).
+
+Even with the limited amount of in-core messages, there is another limit which
+must be imposed in order to avoid memory exhaustion. Each message may contain
+huge amount of recipients (tens or hundreds of thousands are not uncommon), so
+if nqmgr read all recipients of all messages in the active queue, it may easily
+run out of memory. Therefore there must be some upper bound on the amount of
+message recipients which are read into the memory at the same time.
+
+Before discussing how exactly nqmgr implements the recipient limits, let's see
+how the sole existence of the limits themselves affects the nqmgr and its
+scheduler.
+
+The message limit is straightforward - it just limits the size of the lookahead
+the nqmgr's scheduler has when choosing which message can preempt the current
+one. Messages not in the active queue simply are not considered at all.
+
+The recipient limit complicates more things. First of all, the message reading
+code must support reading the recipients in batches, which among other things
+means accessing the queue file several times and continuing where the last
+recipient batch ended. This is invoked by the scheduler whenever the current
+job runs out of in-core recipients and more are required. It is also done any
+time when all in-core recipients of the message are dealt with (which may also
+mean they were deferred) but there are still more in the queue file.
+
+The second complication is that with some recipients left unread in the queue
+file, the scheduler can't operate with exact counts of recipient entries. With
+unread recipients, it is not clear how many recipient entries there will be, as
+they are subject to per-destination grouping. It is not even clear to what
+transports (and thus jobs) the recipients will be assigned. And with messages
+coming from the deferred queue, it is not even clear how many unread recipients
+are still to be delivered. This all means that the scheduler must use only
+estimates of how many recipients entries there will be. Fortunately, it is
+possible to estimate the minimum and maximum correctly, so the scheduler can
+always err on the safe side. Obviously, the better the estimates, the better
+results, so it is best when we are able to read all recipients in-core and turn
+the estimates into exact counts, or at least try to read as many as possible to
+make the estimates as accurate as possible.
+
+The third complication is that it is no longer true that the scheduler is done
+with a job once all of its in-core recipients are delivered. It is possible
+that the job will be revived later, when another batch of recipients is read in
+core. It is also possible that some jobs will be created for the first time
+long after the first batch of recipients was read in core. The nqmgr code must
+be ready to handle all such situations.
+
+And finally, the fourth complication is that the nqmgr code must somehow impose
+the recipient limit itself. Now how does it achieve it?
+
+Perhaps the easiest solution would be to say that each message may have at
+maximum X recipients stored in-core, but such solution would be poor for
+several reasons. With reasonable qmgr_message_active_limit values, the X would
+have to be quite low to maintain reasonable memory footprint. And with low X
+lots of things would not work well. The nqmgr would have problems to use the
+transport_destination_recipient_limit efficiently. The scheduler's preemption
+would be suboptimal as the recipient count estimates would be inaccurate. The
+message queue file would have to be accessed many times to read in more
+recipients again and again.
+
+Therefore it seems reasonable to have a solution which does not use a limit
+imposed on per-message basis, but which maintains a pool of available recipient
+slots, which can be shared among all messages in the most efficient manner. And
+as we do not want separate transports to compete for resources whenever
+possible, it seems appropriate to maintain such recipient pool for each
+transport separately. This is the general idea, now how does it work in
+practice?
+
+First we have to solve little chicken-and-egg problem. If we want to use the
+per-transport recipient pools, we first need to know to what transport(s) is
+the message assigned. But we will find that out only after we read in the
+recipients first. So it is obvious that we first have to read in some
+recipients, use them to find out to what transports is the message to be
+assigned, and only after that we can use the per-transport recipient pools.
+
+Now how many recipients shall we read for the first time? This is what
+qmgr_message_recipient_minimum and qmgr_message_recipient_limit values control.
+The qmgr_message_recipient_minimum value specifies how many recipients of each
+message we will read for the first time, no matter what. It is necessary to
+read at least one recipients before we can assign the message to a transport
+and create the first job. However, reading only qmgr_message_recipient_minimum
+recipients even if there are only few messages with few messages in-core would
+be wasteful. Therefore if there is less than qmgr_message_recipient_limit
+recipients in-core so far, the first batch of recipients may be larger than
+qmgr_message_recipient_minimum - as large as is required to reach the
+qmgr_message_recipient_limit limit.
+
+Once the first batch of recipients was read in core and the message jobs were
+created, the size of the subsequent recipient batches (if any - of course it's
+best when all recipients are read in one batch) is based solely on the position
+of the message jobs on their corresponding transport's job lists. Each
+transport has a pool of transport_recipient_limit recipient slots which it can
+distribute among its jobs (how this is done is described later). The subsequent
+recipient batch may be as large as the sum of all recipient slots of all jobs
+of the message permits (plus the qmgr_message_recipient_minimum amount which
+always applies).
+
+For example, if a message has three jobs, first with 1 recipient still in-core
+and 4 recipient slots, second with 5 recipient in-core and 5 recipient slots,
+and third with 2 recipients in-core and 0 recipient slots, it has 1+5+2=7
+recipients in-core and 4+5+0=9 jobs' recipients slots in total. This means that
+we could immediately read 2+qmgr_message_recipient_minimum more recipients of
+that message in core.
+
+The above example illustrates several things which might be worth mentioning
+explicitly: first, note that although the per-transport slots are assigned to
+particular jobs, we can't guarantee that once the next batch of recipients is
+read in core, that the corresponding amounts of recipients will be assigned to
+those jobs. The jobs lend its slots to the message as a whole, so it is
+possible that some jobs end up sponsoring other jobs of their message. For
+example, if in the example above the 2 newly read recipients were assigned to
+the second job, the first job sponsored the second job with 2 slots. The second
+notable thing is the third job, which has more recipients in-core than it has
+slots. Apart from sponsoring by other job we just saw it can be result of the
+first recipient batch, which is sponsored from global recipient pool of
+qmgr_message_recipient_limit recipients. It can be also sponsored from the
+message recipient pool of qmgr_message_recipient_minimum recipients.
+
+Now how does each transport distribute the recipient slots among its jobs? The
+strategy is quite simple. As most scheduler activity happens on the head of the
+job list, it is our intention to make sure that the scheduler has the best
+estimates of the recipient counts for those jobs. As we mentioned above, this
+means that we want to try to make sure that the messages of those jobs have all
+recipients read in-core. Therefore the transport distributes the slots "along"
+the job list from start to end. In this case the job list sorted by message
+enqueue time is used, because it doesn't change over time as the scheduler's
+job list does.
+
+More specifically, each time a job is created and appended to the job list, it
+gets all unused recipient slots from its transport's pool. It keeps them until
+all recipients of its message are read. When this happens, all unused recipient
+slots are transferred to the next job (which is now in fact now first such job)
+on the job list which still has some recipients unread, or eventually back to
+the transport pool if there is no such job. Such transfer then also happens
+whenever a recipient entry of that job is delivered.
+
+There is also a scenario when a job is not appended to the end of the job list
+(for example it was created as a result of second or later recipient batch).
+Then it works exactly as above, except that if it was put in front of the first
+unread job (that is, the job of a message which still has some unread
+recipients in queue file), that job is first forced to return all of its unused
+recipient slots to the transport pool.
+
+The algorithm just described leads to the following state: The first unread job
+on the job list always gets all the remaining recipient slots of that transport
+(if there are any). The jobs queued before this job are completely read (that
+is, all recipients of their message were already read in core) and have at
+maximum as many slots as they still have recipients in-core (the maximum is
+there because of the sponsoring mentioned before) and the jobs after this job
+get nothing from the transport recipient pool (unless they got something before
+and then the first unread job was created and enqueued in front of them later -
+in such case the also get at maximum as many slots as they have recipients in-
+core).
+
+Things work fine in such state for most of the time, because the current job is
+either completely read in-core or has as much recipient slots as there are, but
+there is one situation which we still have to take care specially. Imagine if
+the current job is preempted by some unread job from the job list and there are
+no more recipient slots available, so this new current job could read only
+batches of qmgr_message_recipient_minimum recipients at a time. This would
+really degrade performance. For this reason, each transport has extra pool of
+transport_extra_recipient_limit recipient slots, dedicated exactly for this
+situation. Each time an unread job preempts the current job, it gets half of
+the remaining recipient slots from the normal pool and this extra pool.
+
+And that's it. It sure does sound pretty complicated, but fortunately most
+people don't really have to care how exactly it works as long as it works.
+Perhaps the only important things to know for most people ire the following
+upper bound formulas:
+
+Each transport has at maximum
+
+ max(
+ qmgr_message_recipient_minimum * qmgr_message_active_limit
+ + *_recipient_limit + *_extra_recipient_limit,
+ qmgr_message_recipient_limit
+ )
+
+recipients in core.
+
+The total amount of recipients in core is
+
+ max(
+ qmgr_message_recipient_minimum * qmgr_message_active_limit
+ + sum( *_recipient_limit + *_extra_recipient_limit ),
+ qmgr_message_recipient_limit
+ )
+
+where the sum is over all used transports.
+
+And this terribly complicated chapter concludes the documentation of nqmgr
+scheduler.
+
+[By now you should theoretically know the nqmgr scheduler inside out. In
+practice, you still hope that you will never have to really understand the last
+or last two chapters completely, and fortunately most people really won't.
+Understanding how the scheduler works in ideal conditions is more than good
+enough for vast majority of users.]
C\bCr\bre\bed\bdi\bit\bts\bs
site detection.
* These simplifications, and their modular implementation, helped to develop
further insights into the different roles that positive and negative
- concurrency feedback play, and helped to avoid all the known worst-case
+ concurrency feedback play, and helped to identify some worst-case
scenarios.
/etc/postfix/main.cf:
smtp_tls_CAfile = /etc/postfix/cacert.pem
smtp_tls_session_cache_database =
- btree:/var/spool/postfix/smtp_tls_session_cache
+ btree:/var/lib/postfix/smtp_tls_session_cache
smtp_use_tls = yes
smtpd_tls_CAfile = /etc/postfix/cacert.pem
smtpd_tls_cert_file = /etc/postfix/FOO-cert.pem
smtpd_tls_key_file = /etc/postfix/FOO-key.pem
smtpd_tls_received_header = yes
smtpd_tls_session_cache_database =
- btree:/var/spool/postfix/smtpd_tls_session_cache
+ btree:/var/lib/postfix/smtpd_tls_session_cache
tls_random_source = dev:/dev/urandom
# Postfix 2.3 and later
smtpd_tls_security_level = may
# ==========================================================================
smtp inet n - n - - smtpd
#submission inet n - n - - smtpd
-# -o smtpd_enforce_tls=yes
+# -o smtpd_tls_security_level=encrypt
# -o smtpd_sasl_auth_enable=yes
# -o smtpd_client_restrictions=permit_sasl_authenticated,reject
# -o milter_macro_daemon_name=ORIGINATING
<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
- "http://www.w3.org/TR/html4/loose.dtd">
+ "http://www.w3.org/TR/html4/loose.dtd">
<html>
<body>
-<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Queue Scheduler</h1>
+<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
+Queue Scheduler</h1>
<hr>
the last delivery attempt. There are two major classes of mechanisms
that control the operation of the queue manager. </p>
-<p> The first class of mechanisms is concerned with the number of
-concurrent deliveries to a specific destination, including decisions
-on when to suspend deliveries after persistent failures: </p>
-
- <ul>
-
- <li> <a href="#concurrency"> Concurrency scheduling </a>
-
- <ul>
-
- <li> <a href="#concurrency_summary_2_5"> Summary of the
- Postfix 2.5 concurrency feedback algorithm </a>
-
- <li> <a href="#dead_summary_2_5"> Summary of the Postfix
- 2.5 "dead destination" detection algorithm </a>
-
- <li> <a href="#pseudo_code_2_5"> Pseudocode for the Postfix
- 2.5 concurrency scheduler </a>
+<ul>
- <li> <a href="#concurrency_results"> Results for delivery
- to concurrency limited servers </a>
+<li> <a href="#concurrency"> Concurrency scheduling </a> is concerned
+with the number of concurrent deliveries to a specific destination,
+including decisions on when to suspend deliveries after persistent
+failures.
- <li> <a href="#concurrency_discussion"> Discussion of
- concurrency limited server results </a>
+<li> <a href="#jobs"> Preemptive scheduling </a> is concerned with
+the selection of email messages and recipients for a given destination.
- <li> <a href="#concurrency_limitations"> Limitations of
- less-than-1 per delivery feedback </a>
+<li> <a href="#credits"> Credits </a>. This document would not be
+complete without.
- <li> <a href="#concurrency_config"> Concurrency configuration
- parameters </a>
+</ul>
- </ul>
+<!--
- </ul>
+<p> Once started, the <a href="qmgr.8.html">qmgr(8)</a> process runs until "postfix reload"
+or "postfix stop". As a persistent process, the queue manager has
+to meet strict requirements with respect to code correctness and
+robustness. Unlike non-persistent daemon processes, the queue manager
+cannot benefit from Postfix's process rejuvenation mechanism that
+limit the impact from resource leaks and other coding errors
+(translation: replacing a process after a short time covers up bugs
+before they can become a problem). </p>
-<p> The second class of mechanisms is concerned with the selection
-of what mail to deliver to a given destination: </p>
+-->
- <ul>
+<h2> <a name="concurrency"> Concurrency scheduling </a> </h2>
- <li> <a href="#jobs"> Preemptive scheduling </a>
+<p> The following sections document the Postfix 2.5 concurrency
+scheduler, after a discussion of the limitations of the existing
+concurrency scheduler. This is followed by results of medium-concurrency
+experiments, and a discussion of trade-offs between performance and
+robustness. </p>
- <ul>
+<p> The material is organized as follows: </p>
- <li> <a href="#job_motivation"> Why the non-preemptive Postfix queue
- manager was replaced </a>
+<ul>
- <li> <a href="#job_design"> How the non-preemptive queue manager
- scheduler works </a>
+<li> <a href="#concurrency_drawbacks"> Drawbacks of the existing
+concurrency scheduler </a>
- </ul>
+<li> <a href="#concurrency_summary_2_5"> Summary of the Postfix 2.5
+concurrency feedback algorithm </a>
- </ul>
+<li> <a href="#dead_summary_2_5"> Summary of the Postfix 2.5 "dead
+destination" detection algorithm </a>
-<p> And this document would not be complete without: </p>
+<li> <a href="#pseudo_code_2_5"> Pseudocode for the Postfix 2.5
+concurrency scheduler </a>
- <ul>
+<li> <a href="#concurrency_results"> Results for delivery to
+concurrency limited servers </a>
- <li> <a href="#credits"> Credits </a>
+<li> <a href="#concurrency_discussion"> Discussion of concurrency
+limited server results </a>
- </ul>
+<li> <a href="#concurrency_limitations"> Limitations of less-than-1
+per delivery feedback </a>
-<!--
+<li> <a href="#concurrency_config"> Concurrency configuration
+parameters </a>
-<p> Once started, the <a href="qmgr.8.html">qmgr(8)</a> process runs until "postfix reload"
-or "postfix stop". As a persistent process, the queue manager has
-to meet strict requirements with respect to code correctness and
-robustness. Unlike non-persistent daemon processes, the queue manager
-cannot benefit from Postfix's process rejuvenation mechanism that
-limit the impact from resource leaks and other coding errors
-(translation: replacing a process after a short time covers up bugs
-before they can become a problem). </p>
-
--->
+</ul>
-<h2> <a name="concurrency"> Concurrency scheduling </a> </h2>
+<h3> <a name="concurrency_drawbacks"> Drawbacks of the existing
+concurrency scheduler </a> </h3>
-<p> This section documents the Postfix 2.5 concurrency scheduler.
-Prior Postfix versions used a simple but robust algorithm where the
-per-destination delivery concurrency was decremented by 1 after a
-delivery suffered connection or handshake failure, and was incremented
-by 1 otherwise. Of course the concurrency was never allowed to
-exceed the maximum per-destination concurrency limit. And when a
-destination's concurrency level dropped to zero, the destination
-was declared "dead" and delivery was suspended. </p>
+<p> From the start, Postfix has used a simple but robust algorithm
+where the per-destination delivery concurrency is decremented by 1
+after a delivery suffered connection or handshake failure, and
+incremented by 1 otherwise. Of course the concurrency is never
+allowed to exceed the maximum per-destination concurrency limit.
+And when a destination's concurrency level drops to zero, the
+destination is declared "dead" and delivery is suspended. </p>
-<p> Drawbacks of the old +/-1 feedback per delivery are: <p>
+<p> Drawbacks of +/-1 concurrency feedback per delivery are: <p>
<ul>
<li> <p> Overshoot due to exponential delivery concurrency growth
-with each pseudo-cohort(*). For example, with the default initial
-concurrency of 5, concurrency would proceed over time as (5-10-20).
-</p>
+with each pseudo-cohort(*). This can be an issue with high-concurrency
+channels. For example, with the default initial concurrency of 5,
+concurrency would proceed over time as (5-10-20). </p>
<li> <p> Throttling down to zero concurrency after a single
pseudo-cohort(*) failure. This was especially an issue with
configurable number of pseudo-cohorts reports connection or handshake
failure. </p>
-<h3> <a name="concurrency_summary_2_5"> Summary of the Postfix 2.5 concurrency feedback algorithm </a> </h3>
+<h3> <a name="concurrency_summary_2_5"> Summary of the Postfix 2.5
+concurrency feedback algorithm </a> </h3>
<p> We want to increment a destination's delivery concurrency when
some (not necessarily consecutive) number of deliveries complete
<p> All results in the previous sections are based on the first
delivery runs only; they do not include any second etc. delivery
-attempts. The first two examples show that the feedback method
-matters little when concurrency is limited due to congestion. This
+attempts. The first two examples show that the effect of feedback
+is negligible when concurrency is limited due to congestion. This
is because the initial concurrency is already at the client's
concurrency maximum, and because there is 10-100 times more positive
-than negative feedback. Under these conditions, the contribution
-from SMTP connection caching is negligible. </p>
+than negative feedback. Under these conditions, it is no surprise
+that the contribution from SMTP connection caching is also negligible.
+</p>
<p> In the last example, the old +/-1 feedback per delivery will
defer 50% of the mail when confronted with an active (anvil-style)
<h2> <a name="jobs"> Preemptive scheduling </a> </h2>
-<p> This is the beginning of documentation for a preemptive queue
-manager scheduling algorithm by Patrik Rak. For a long time, this
-code was made available under the name "nqmgr(8)" (new queue manager),
-as an optional module. As of Postfix 2.1 this is the default queue
-manager, which is always called "<a href="qmgr.8.html">qmgr(8)</a>". The old queue manager
-will for some time will be available under the name of "<a href="qmgr.8.html">oqmgr(8)</a>".
+<p>
+
+This document attempts to describe the new queue manager and its
+preeemptive scheduler algorithm. Note that the document was originally
+written to describe the changes between the new queue manager (in
+this text referred to as <tt>nqmgr</tt>, the name it was known by
+before it became the default queue manager) and the old queue manager
+(referred to as <tt>oqmgr</tt>). This is why it refers to <tt>oqmgr</tt>
+every so often.
+
</p>
-<h3> <a name="job_motivation"> Why the non-preemptive Postfix queue manager was replaced </a> </h3>
+<p>
-<p> The non-preemptive Postfix scheduler had several limitations
-due to unfortunate choices in its design. </p>
+This document is divided into sections as follows:
-<ol>
+</p>
- <li> <p> Round-robin selection by destination for mail that is
- delivered via the same message delivery transport. The round-robin
- strategy was chosen with the intention to prevent a single
- (destination) site from using up too many mail delivery resources.
- However, that strategy penalized inbound mail on bi-directional
- gateways. The poor suffering inbound destination would be
- selected only 1/number-of-destinations of the time, even when
- it had more mail than other destinations, and thus mail could
- be delayed. </p>
-
- <p> Victor Duchovni found a workaround: use different message
- delivery transports, and thus avoid the starvation problem.
- The Patrik Rak scheduler solves this problem by using FIFO
- selection. </p>
-
- <li> <p> A second limitation of the old Postfix scheduler was
- that delivery of bulk mail would block all other deliveries,
- causing large delays. Patrik Rak's scheduler allows mail with
- fewer recipients to slip past bulk mail in an elegant manner.
- </p>
+<ul>
-</ol>
+<li> <a href="#<tt>nqmgr</tt>_structures"> The structures used by
+nqmgr </a>
+
+<li> <a href="#<tt>nqmgr</tt>_pickup"> What happens when nqmgr picks
+up the message </a> - how it is assigned to transports, jobs, peers,
+entries
+
+<li> <a href="#<tt>nqmgr</tt>_selection"> How does the entry selection
+work </a>
+
+<li> <a href="#<tt>nqmgr</tt>_preemption"> How does the preemption
+work </a> - what messages may be preempted and how and what messages
+are chosen to preempt them
+
+<li> <a href="#<tt>nqmgr</tt>_concurrency"> How destination concurrency
+limits affect the scheduling algorithm </a>
+
+<li> <a href="#<tt>nqmgr</tt>_memory"> Dealing with memory resource
+limits </a>
+
+</ul>
+
+<h3> <a name="<tt>nqmgr</tt>_structures"> The structures used by
+nqmgr </a> </h3>
+
+<p>
+
+Let's start by recapitulating the structures and terms used when
+referring to queue manager and how it operates. Many of these are
+partially described elsewhere, but it is nice to have a coherent
+overview in one place:
+
+</p>
+
+<ul>
+
+<li> <p> Each message structure represents one mail message which
+Postfix is to deliver. The message recipients specify to what
+destinations is the message to be delivered and what transports are
+going to be used for the delivery. </p>
+
+<li> <p> Each recipient entry groups a batch of recipients of one
+message which are all going to be delivered to the same destination.
+</p>
+
+<li> <p> Each transport structure groups everything what is going
+to be delivered by delivery agents dedicated for that transport.
+Each transport maintains a set of queues (describing the destinations
+it shall talk to) and jobs (referencing the messages it shall
+deliver). </p>
+
+<li> <p> Each transport queue (not to be confused with the on-disk
+<a href="QSHAPE_README.html#active_queue">active queue</a> or <a href="QSHAPE_README.html#incoming_queue">incoming queue</a>) groups everything what is going be
+delivered to given destination (aka nexthop) by its transport. Each
+queue belongs to one transport, so each destination may be referred
+to by several queues, one for each transport. Each queue maintains
+a list of all recipient entries (batches of message recipients)
+which shall be delivered to given destination (the todo list), and
+a list of recipient entries already being delivered by the delivery
+agents (the busy list). </p>
+
+<li> <p> Each queue corresponds to multiple peer structures. Each
+peer structure is like the queue structure, belonging to one transport
+and referencing one destination. The difference is that it lists
+only the recipient entries which all originate from the same message,
+unlike the queue structure, whose entries may originate from various
+messages. For messages with few recipients, there is usually just
+one recipient entry for each destination, resulting in one recipient
+entry per peer. But for large mailing list messages the recipients
+may need to be split to multiple recipient entries, in which case
+the peer structure may list many entries for single destination.
+</p>
+
+<li> <p> Each transport job groups everything it takes to deliver
+one message via its transport. Each job represents one message
+within the context of the transport. The job belongs to one transport
+and message, so each message may have multiple jobs, one for each
+transport. The job groups all the peer structures, which describe
+the destinations the job's message has to be delivered to. </p>
+
+</ul>
+
+<p>
+
+The first four structures are common to both <tt>nqmgr</tt> and
+<tt>oqmgr</tt>, the latter two were introduced by <tt>nqmgr</tt>.
+
+</p>
+
+<p>
+
+These terms are used extensively in the text below, feel free to
+look up the description above anytime you'll feel you have lost a
+sense what is what.
+
+</p>
+
+<h3> <a name="<tt>nqmgr</tt>_pickup"> What happens when nqmgr picks
+up the message </a> </h3>
+
+<p>
+
+Whenever <tt>nqmgr</tt> moves a queue file into the <a href="QSHAPE_README.html#active_queue">active queue</a>,
+the following happens: It reads all necessary information from the
+queue file as <tt>oqmgr</tt> does, and also reads as many recipients
+as possible - more on that later, for now let's just pretend it
+always reads all recipients.
+
+</p>
+
+<p>
+
+Then it resolves the recipients as <tt>oqmgr</tt> does, which
+means obtaining (address, nexthop, transport) triple for each
+recipient. For each triple, it finds the transport; if it does not
+exist yet, it instantiates it (unless it's dead). Within the
+transport, it finds the destination queue for given nexthop; if it
+does not exist yet, it instantiates it (unless it's dead). The
+triple is then bound to given destination queue. This happens in
+qmgr_resolve() and is basically the same as in <tt>oqmgr</tt>.
+
+</p>
+
+<p>
+
+Then for each triple which was bound to some queue (and thus
+transport), the program finds the job which represents the message
+within that transport's context; if it does not exist yet, it
+instantiates it. Within the job, it finds the peer which represents
+the bound destination queue within this jobs context; if it does
+not exist yet, it instantiates it. Finally, it stores the address
+from the resolved triple to the recipient entry which is appended
+to both the queue entry list and the peer entry list. The addresses
+for same nexthop are batched in the entries up to recipient_concurrency
+limit for that transport. This happens in qmgr_assign() and apart
+from that it operates with job and peer structures is basically the
+same as in <tt>oqmgr</tt>.
+
+</p>
+
+<p>
+
+When the job is instantiated, it is enqueued on the transport's job
+list based on the time its message was picked up by <tt>nqmgr</tt>.
+For first batch of recipients this means it is appended to the end
+of the job list, but the ordering of the job list by the enqueue
+time is important as we will see shortly.
+
+</p>
+
+<p>
+
+[Now you should have pretty good idea what is the state of the
+<tt>nqmgr</tt> after couple of messages was picked up, what is the
+relation between all those job, peer, queue and entry structures.]
+
+</p>
+
+<h3> <a name="<tt>nqmgr</tt>_selection"> How does the entry selection
+work </a> </h3>
+
+<p>
+
+Having prepared all those above mentioned structures, the task of
+the <tt>nqmgr</tt>'s scheduler is to choose the recipient entries
+one at a time and pass them to the delivery agent for corresponding
+transport. Now how does this work?
+
+</p>
+
+<p>
+
+The first approximation of the new scheduling algorithm is like this:
+
+</p>
+
+<blockquote>
+<pre>
+foreach transport (round-robin-by-transport)
+do
+ if transport busy continue
+ if transport process limit reached continue
+ foreach transport's job (in the order of the transport's job list)
+ do
+ foreach job's peer (round-robin-by-destination)
+ if peer->queue->concurrency < peer->queue->window
+ return next peer entry.
+ done
+ done
+done
+</pre>
+</blockquote>
+
+<p>
+
+Now what is the "order of the transport's job list"? As we know
+already, the job list is by default kept in the order the message
+was picked up by the <tt>nqmgr</tt>. So by default we get the
+top-level round-robin transport, and within each transport we get
+the FIFO message delivery. The round-robin of the peers by the
+destination is perhaps of little importance in most real-life cases
+(unless the recipient_concurrency limit is reached, in one job there
+is only one peer structure for each destination), but theoretically
+it makes sure that even within single jobs, destinations are treated
+fairly.
+
+</p>
+
+<p>
+
+[By now you should have a feeling you really know how the scheduler
+works, except for the preemption, under ideal conditions - that is,
+no recipient resource limits and no destination concurrency problems.]
+
+</p>
+
+<h3> <a name="<tt>nqmgr</tt>_preemption"> How does the preemption
+work </a> </h3>
+
+<p>
+
+As you might perhaps expect by now, the transport's job list does
+not remain sorted by the job's message enqueue time all the time.
+The most cool thing about <tt>nqmgr</tt> is not the simple FIFO
+delivery, but that it is able to slip mail with little recipients
+past the mailing-list bulk mail. This is what the job preemption
+is about - shuffling the jobs on the transport's job list to get
+the best message delivery rates. Now how is it achieved?
+
+</p>
+
+<p>
+
+First I have to tell you that there are in fact two job lists in
+each transport. One is the scheduler's job list, which the scheduler
+is free to play with, while the other one keeps the jobs always
+listed in the order of the enqueue time and is used for recipient
+pool management we will discuss later. For now, we will deal with
+the scheduler's job list only.
+
+</p>
+
+<p>
+
+So, we have the job list, which is first ordered by the time the
+job's messages were enqueued, oldest messages first, the most recently
+picked one at the end. For now, let's assume that there are no
+destination concurrency problems. Without preemption, we pick some
+entry of the first (oldest) job on the queue, assign it to delivery
+agent, pick another one from the same job, assign it again, and so
+on, until all the entries are used and the job is delivered. We
+would then move onto the next job and so on and on. Now how do we
+manage to sneak in some entries from the recently added jobs when
+the first job on the job list belongs to a message going to the
+mailing-list and has thousands of recipient entries?
+
+</p>
+
+<p>
+
+The <tt>nqmgr</tt>'s answer is that we can artificially "inflate"
+the delivery time of that first job by some constant for free - it
+is basically the same trick you might remember as "accumulation of
+potential" from the amortized complexity lessons. For example,
+instead of delivering the entries of the first job on the job list
+every time an delivery agent becomes available, we can do it only
+every second time. If you view the moments the delivery agent becomes
+available on a timeline as "delivery slots", then instead of using
+every delivery slot for the first job, we can use only every other
+slot, and still the overall delivery efficiency of the first job
+remains the same. So the delivery <tt>11112222</tt> becomes
+<tt>1.1.1.1.2.2.2.2</tt> (1 and 2 are the imaginary job numbers, .
+denotes the free slot). Now what do we do with free slots?
+
+</p>
+
+<p>
+
+As you might have guessed, we will use them for sneaking the mail
+with little recipients in. For example, if we have one four-recipient
+mail followed by four one recipients mail, the delivery sequence
+(that is, the sequence in which the jobs are assigned to the
+delivery slots) might look like this: <tt>12131415</tt>. Hmm, fine
+for sneaking in the single recipient mail, but how do we sneak in
+the mail with more than one recipient? Say if we have one four-recipient
+mail followed by two two-recipient mails?
+
+</p>
+
+<p>
+
+The simple answer would be to use delivery sequence <tt>12121313</tt>.
+But the problem is that this does not scale well. Imagine you have
+mail with thousand recipients followed by mail with hundred recipients.
+It is tempting to suggest the delivery sequence like <tt>121212....</tt>,
+but alas! Imagine there arrives another mail with say ten recipients.
+But there are no free slots anymore, so it can't slip by, not even
+if it had just only one recipients. It will be stuck until the
+hundred-recipient mail is delivered, which really sucks.
+
+</p>
+
+<p>
+
+So, it becomes obvious that while the inflating the message to get
+free slots is great idea, one has to be really careful of how the
+free slots are assigned, otherwise one might corner himself. So,
+how does <tt>nqmgr</tt> really use the free slots?
+
+</p>
+
+<p>
+
+The key idea is that one does not have to generate the free slots
+in a uniform way. The delivery sequence <tt>111...1</tt> is no
+worse than <tt>1.1.1.1</tt>, in fact, it is even better as some
+entries are in the first case selected earlier than in the second
+case, and none is selected later! So it is possible to first to
+"accumulate" the free delivery slots and then use them all at once.
+It is even possible to accumulate some, then use them, then accumulate
+some more and use them again, as in <tt>11..1.1</tt> .
+
+</p>
+
+<p>
+
+Let's get back to the one hundred recipient example. We now know
+that we could first accumulate one hundred free slots, and only
+after then to preempt the first job and sneak the one hundred
+recipient mail in. Applying the algorithm recursively, we see the
+hundred recipient job can accumulate ten free delivery slots, and
+then we could preempt it and sneak in the ten recipient mail...
+Wait wait wait! Could we? Aren't we overinflating the original one
+thousand recipient mail?
+
+</p>
+
+<p>
+
+Well, despite it looks so at the first glance, another trick will
+allow us to answer "no, we are not!". If we had said that we will
+inflate the delivery time twice at maximum, and then we consider
+every other slot as a free slot, then we would overinflate in case
+of the recursive preemption. BUT! The trick is that if we use only
+every n-th slot as a free slot for n>2, there is always some worst
+inflation factor which we can guarantee not to be breached, even
+if we apply the algorithm recursively. To be precise, if for every
+k>1 normally used slots we accumulate one free delivery slot, than
+the inflation factor is not worse than k/(k-1) no matter how many
+recursive preemptions happen. And it's not worse than (k+1)/k if
+only non-recursive preemption happens. Now, having got through the
+theory and the related math, let's see how <tt>nqmgr</tt> implements
+this.
+
+</p>
+
+<p>
+
+Each job has so called "available delivery slot" counter. Each
+transport has a <a href="postconf.5.html#transport_delivery_slot_cost"><i>transport</i>_delivery_slot_cost</a> parameter, which
+defaults to <a href="postconf.5.html#default_delivery_slot_cost">default_delivery_slot_cost</a> parameter which is set to 5
+by default. This is the k from the paragraph above. Each time k
+entries of the job are selected for delivery, this counter is
+incremented by one. Once there are some slots accumulated, job which
+requires no more than that amount of slots to be fully delivered
+can preempt this job.
+
+</p>
+
+<p>
+
+[Well, the truth is, the counter is incremented every time an entry
+is selected and it is divided by k when it is used. Or even more
+true, there is no division, the other side of the equation is
+multiplied by k. But for the understanding it's good enough to use
+the above approximation of the truth.]
+
+</p>
+
+<p>
+
+OK, so now we know the conditions which must be satisfied so one
+job can preempt another one. But what job gets preempted, how do
+we choose what job preempts it if there are several valid candidates,
+and when does all this exactly happen?
+
+</p>
+
+<p>
+
+The answer for the first part is simple. The job whose entry was
+selected the last time is so called current job. Normally, it is
+the first job on the scheduler's job list, but destination concurrency
+limits may change this as we will see later. It is always only the
+current job which may get preempted.
+
+</p>
+
+<p>
+
+Now for the second part. The current job has certain amount of
+recipient entries, and as such may accumulate at maximum some amount
+of available delivery slots. It might have already accumulated some,
+and perhaps even already used some when it was preempted before
+(remember a job can be preempted several times). In either case,
+we know how many are accumulated and how many are left to deliver,
+so we know how many it may yet accumulate at maximum. Every other
+job which may be delivered by less than that amount of slots is an
+valid candidate for preemption. How do we choose among them?
+
+</p>
+
+<p>
+
+The answer is - the one with maximum enqueue_time/recipient_entry_count.
+That is, the older the job is, the more we should try to deliver
+it in order to get best message delivery rates. These rates are of
+course subject to how many recipients the message has, therefore
+the division by the recipient (entry) count. No one shall be surprised
+that message with n recipients takes n times longer to deliver than
+message with one recipient.
+
+</p>
+
+<p>
+
+Now let's recap the previous two paragraphs. Isn't it too complicated?
+Why don't the candidates come only among the jobs which can be
+delivered within the amount of slots the current job already
+accumulated? Why do we need to estimate how much it has yet to
+accumulate? If you found out the answer, congratulate yourself. If
+we did it this simple way, we would always choose the candidate
+with least recipient entries. If there were enough single recipient
+mails coming in, they would always slip by the bulk mail as soon
+as possible, and the two and more recipients mail would never get
+a chance, no matter how long they have been sitting around in the
+job list.
+
+</p>
+
+<p>
+
+This candidate selection has interesting implication - that when
+we choose the best candidate for preemption (this is done in
+qmgr_choose_candidate()), it may happen that we may not use it for
+preemption immediately. This leads to an answer to the last part
+of the original question - when does the preemption happen?
+
+</p>
+
+<p>
+
+The preemption attempt happens every time next transport's recipient
+entry is to be chosen for delivery. To avoid needless overhead, the
+preemption is not attempted if the current job could never accumulate
+more than <a href="postconf.5.html#transport_minimum_delivery_slots"><i>transport</i>_minimum_delivery_slots</a> (defaults to
+<a href="postconf.5.html#default_minimum_delivery_slots">default_minimum_delivery_slots</a> which defaults to 3). If there is
+already enough accumulated slots to preempt the current job by the
+chosen best candidate, it is done immediately. This basically means
+that the candidate is moved in front of the current job on the
+scheduler's job list and decreasing the accumulated slot counter
+by the amount used by the candidate. If there is not enough slots...
+well, I could say that nothing happens and the another preemption
+is attempted the next time. But that's not the complete truth.
+
+</p>
+
+<p>
+
+The truth is that it turns out that it is not really necessary to
+wait until the jobs counter accumulates all the delivery slots in
+advance. Say we have ten recipient mail followed by two two-recipient
+mails. If the preemption happened when enough delivery slot accumulate
+(assuming slot cost 2), the delivery sequence becomes
+<tt>11112211113311</tt>. Now what would we get if we would wait
+only for 50% of the necessary slots to accumulate and we promise
+we would wait for the remaining 50% later, after the we get back
+to the preempted job? If we use such slot loan, the delivery sequence
+becomes <tt>11221111331111</tt>. As we can see, it makes it no
+considerably worse for the delivery of the ten-recipient mail, but
+it allows the small messages to be delivered sooner.
+
+</p>
+
+<p>
+
+The concept of these slot loans is where the
+<a href="postconf.5.html#transport_delivery_slot_discount"><a href="postconf.5.html#transport_delivery_slot_discount"><i>transport</i>_delivery_slot_discount</a></a> and
+<a href="postconf.5.html#transport_delivery_slot_loan"><i>transport</i>_delivery_slot_loan</a> come from (they default to
+<a href="postconf.5.html#default_delivery_slot_discount">default_delivery_slot_discount</a> and <a href="postconf.5.html#default_delivery_slot_loan">default_delivery_slot_loan</a>, whose
+values are by default 50 and 3, respectively). The discount (resp.
+loan) specifies how many percent (resp. how many slots) one "gets
+in advance", when the amount of slots required to deliver the best
+candidate is compared with the amount of slots the current slot had
+accumulated so far.
+
+</p>
+
+<p>
+
+And it pretty much concludes this chapter.
+
+</p>
+
+<p>
+
+[Now you should have a feeling that you pretty much understand the
+scheduler and the preemption, or at least that you will have it
+after you read the last chapter couple more times. You shall clearly
+see the job list and the preemption happening at its head, in ideal
+delivery conditions. The feeling of understanding shall last until
+you start wondering what happens if some of the jobs are blocked,
+which you might eventually figure out correctly from what had been
+said already. But I would be surprised if you mental image of the
+scheduler's functionality it is not completely shattered once you
+start wondering how it works when not all recipients may be read
+in-core. More on that later.]
+
+</p>
+
+<h3> <a name="<tt>nqmgr</tt>_concurrency"> How destination concurrency
+limits affect the scheduling algorithm </a> </h3>
+
+<p>
+
+The <tt>nqmgr</tt> uses the same algorithm for destination concurrency
+control as <tt>oqmgr</tt>. Now what happens when the destination
+limits are reached and no more entries for that destination may be
+selected by the scheduler?
+
+</p>
+
+<p>
+
+From user's point of view it is all simple. If some of the peers
+of a job can't be selected, those peers are simply skipped by the
+entry selection algorithm (the pseudo-code described before) and
+only the selectable ones are used. If none of the peers may be
+selected, the job is declared a "blocker job". Blocker jobs are
+skipped by the entry selection algorithm and they are also excluded
+from the candidates for preemption of current job. Thus the scheduler
+effectively behaves as if the blocker jobs didn't exist on the job
+list at all. As soon as at least one of the peers of a blocker job
+becomes unblocked (that is, the delivery agent handling the delivery
+of the recipient entry for given destination successfully finishes),
+the job's blocker status is removed and the job again participates
+in all further scheduler actions normally.
+
+</p>
+
+<p>
+
+So the summary is that the user's don't really have to be concerned
+about the interaction of the destination limits and scheduling
+algorithm. It works well on its own and there are no knobs they
+would need to control it.
+
+</p>
+
+<p>
+
+From a programmer's point of view, the blocker jobs complicate the
+scheduler quite a lot. Without them, the jobs on the job list would
+be normally delivered in strict FIFO order. If the current job is
+preempted, the job preempting it is completely delivered unless it
+is preempted itself. Without blockers, the current job is thus
+always either the first job on the job list, or the top of the stack
+of jobs preempting the first job on the job list.
+
+</p>
+
+<p>
+
+The visualization of the job list and the preemption stack without
+blockers would be like this:
+
+</p>
+
+<blockquote>
+<pre>
+first job-> 1--2--3--5--6--8--... <- job list
+on job list |
+ 4 <- preemption stack
+ |
+current job-> 7
+</pre>
+</blockquote>
-<h3> <a name="job_design"> How the non-preemptive queue manager scheduler works </a> </h3>
+<p>
-<p> The following text is from Patrik Rak and should be read together
-with the <a href="postconf.5.html">postconf(5)</a> manual that describes each configuration
-parameter in detail. </p>
+In the example above we see that job 1 was preempted by job 4 and
+then job 4 was preempted by job 7. After job 7 is completed, remaining
+entries of job 4 are selected, and once they are all selected, job
+1 continues.
-<p> From user's point of view, <a href="qmgr.8.html">oqmgr(8)</a> and <a href="qmgr.8.html">qmgr(8)</a> are both the same,
-except for how next message is chosen when delivery agent becomes
-available. You already know that <a href="qmgr.8.html">oqmgr(8)</a> uses round-robin by destination
-while <a href="qmgr.8.html">qmgr(8)</a> uses simple FIFO, except for some preemptive magic.
-The <a href="postconf.5.html">postconf(5)</a> manual documents all the knobs the user
-can use to control this preemptive magic - there is nothing else
-to the preemption than the quite simple conditions described in there.
</p>
-<p> As for programmer-level documentation, this will have to be
-extracted from all those emails we have exchanged with Wietse [rats!
-I hoped that Patrik would do the work for me -- Wietse] But I think
-there are no missing bits which we have not mentioned in our
-conversations. </p>
+<p>
-<p> However, even from programmer's point of view, there is nothing
-more to add to the message scheduling idea itself. There are few
-things which make it look more complicated than it is, but the
-algorithm is the same as the user perceives it. The summary of the
-differences of the programmer's view from the user's view are: </p>
+As we see, it's all very clean and straightforward. Now how does
+this change because of blockers?
+
+</p>
+
+<p>
+
+The answer is: a lot. Any job may become blocker job at any time,
+and also become normal job again at any time. This has several
+important implications:
+
+</p>
<ol>
- <li> <p> Simplification of terms for users: The user knows
- about messages and recipients. The program itself works with
- jobs (one message is split among several jobs, one per each
- transport needed to deliver the message) and queue entries
- (each entry may group several recipients for same destination).
- Then there is the peer structure introduced by <a href="qmgr.8.html">qmgr(8)</a> which is
- simply per-job analog of the queue structure. </p>
-
- <li> <p> Dealing with concurrency limits: The actual implementation
- is complicated by the fact that the messages (resp. jobs) may
- not be delivered in the exactly scheduled order because of the
- concurrency limits. It is necessary to skip some "blocker" jobs
- when the concurrency limit is reached and get back to them
- again when the limit permits. </p>
-
- <li> <p> Dealing with resource limits: The actual implementation is
- complicated by the fact that not all recipients may be read in-core.
- Therefore each message has some recipients in-core and some may
- remain on-file. This means that a) the preemptive algorithm needs
- to work with recipient count estimates instead of exact counts, b)
- there is extra code which needs to manipulate the per-transport
- pool of recipients which may be read in-core at the same time, and
- c) there is extra code which needs to be able to read recipients
- into core in batches and which is triggered at appropriate moments. </p>
-
- <li> <p> Doing things efficiently: All important things I am
- aware of are done in the minimum time possible (either directly
- or at least when amortized complexity is used), but to choose
- which job is the best candidate for preempting the current job
- requires linear search of up to all transport jobs (the worst
- theoretical case - the reality is much better). As this is done
- every time the next queue entry to be delivered is about to be
- chosen, it seemed reasonable to add cache which minimizes the
- overhead. Maintenance of this candidate cache slightly obfuscates
- things.
+<li> <p>
+
+The jobs may be completed in arbitrary order. For example, in the
+example above, if the current job 7 becomes blocked, the next job
+4 may complete before the job 7 becomes unblocked again. Or if both
+7 and 4 are blocked, then 1 is completed, then 7 becomes unblocked
+and is completed, then 2 is completed and only after that 4 becomes
+unblocked and is completed... You get the idea.
+
+</p>
+
+<p>
+
+[Interesting side note: even when jobs are delivered out of order,
+from single destination's point of view the jobs are still delivered
+in the expected order (that is, FIFO unless there was some preemption
+involved). This is because whenever a destination queue becomes
+unblocked (the destination limit allows selection of more recipient
+entries for that destination), all jobs which have peers for that
+destination are unblocked at once.]
+
+</p>
+
+<li> <p>
+
+The idea of the preemption stack at the head of the job list is
+gone. That is, it must be possible to preempt any job on the job
+list. For example, if the jobs 7, 4, 1 and 2 in the example above
+become all blocked, job 3 becomes the current job. And of course
+we do not want the preemption be affected by the fact that there
+are some blocked jobs or not. Therefore, if it turns out that job
+3 might be preempted by job 6, the implementation shall make it
+possible.
+
+</p>
+
+<li> <p>
+
+The idea of the linear preemption stack itself is gone. It's no
+longer true that one job is always preempted by only one job at one
+time (that is directly preempted, not counting the recursively
+nested jobs). For example, in the example above, job 1 is directly
+preempted by only job 4, and job 4 by job 7. Now assume job 7 becomes
+blocked, and job 4 is being delivered. If it accumulates enough
+delivery slots, it is natural that it might be preempted for example
+by job 8. Now job 4 is preempted by both job 7 AND job 8 at the
+same time.
+
+</p>
</ol>
-<p> The points 2 and 3 are those which made the implementation
-(look) complicated and were the real coding work, but I believe
-that to understand the scheduling algorithm itself (which was the
-real thinking work) is fairly easy. </p>
+<p>
+
+Now combine the points 2) and 3) with point 1) again and you realize
+that the relations on the once linear job list became pretty
+complicated. If we extend the point 3) example: jobs 7 and 8 preempt
+job 4, now job 8 becomes blocked too, then job 4 completes. Tricky,
+huh?
+
+</p>
+
+<p>
+
+If I illustrate the relations after the above mentioned examples
+(but those in point 1)), the situation would look like this:
+
+</p>
+
+<blockquote>
+<pre>
+ v- parent
+
+adoptive parent -> 1--2--3--5--... <- "stack" level 0
+ | |
+parent gone -> ? 6 <- "stack" level 1
+ / \
+children -> 7 8 ^- child <- "stack" level 2
+
+ ^- siblings
+</pre>
+</blockquote>
+
+<p>
+
+Now how does <tt>nqmgr</tt> deal with all these complicated relations?
+
+</p>
+
+<p>
+
+Well, it maintains them all as described, but fortunately, all these
+relations are necessary only for purposes of proper counting of
+available delivery slots. For purposes of ordering the jobs for
+entry selection, the original rule still applies: "the job preempting
+the current job is moved in front of the current job on the job
+list". So for entry selection purposes, the job relations remain
+as simple as this:
+
+</p>
+
+<blockquote>
+<pre>
+7--8--1--2--6--3--5--.. <- scheduler's job list order
+</pre>
+</blockquote>
+
+<p>
+
+The job list order and the preemption parent/child/siblings relations
+are maintained separately. And because the selection works only
+with the job list, you can happily forget about those complicated
+relations unless you want to study the <tt>nqmgr</tt> sources. In
+that case the text above might provide some helpful introduction
+to the problem domain. Otherwise I suggest you just forget about
+all this and stick with the user's point of view: the blocker jobs
+are simply ignored.
+
+</p>
+
+<p>
+
+[By now, you should have a feeling that there is more things going
+under the hood than you ever wanted to know. You decide that
+forgetting about this chapter is the best you can do for the sake
+of your mind's health and you basically stick with the idea how the
+scheduler works in ideal conditions, when there are no blockers,
+which is good enough.]
+
+</p>
+
+<h3> <a name="<tt>nqmgr</tt>_memory"> Dealing with memory resource
+limits </a> </h3>
+
+<p>
+
+When discussing the <tt>nqmgr</tt> scheduler, we have so far assumed
+that all recipients of all messages in the <a href="QSHAPE_README.html#active_queue">active queue</a> are completely
+read into the memory. This is simply not true. There is an upper
+bound on the amount of memory the <tt>nqmgr</tt> may use, and
+therefore it must impose some limits on the information it may store
+in the memory at any given time.
+
+</p>
+
+<p>
+
+First of all, not all messages may be read in-core at once. At any
+time, only <a href="postconf.5.html#qmgr_message_active_limit">qmgr_message_active_limit</a> messages may be read in-core
+at maximum. When read into memory, the messages are picked from the
+<a href="QSHAPE_README.html#incoming_queue">incoming</a> and deferred message queues and moved to the <a href="QSHAPE_README.html#active_queue">active queue</a>
+(incoming having priority), so if there is more than
+<a href="postconf.5.html#qmgr_message_active_limit">qmgr_message_active_limit</a> messages queued in the <a href="QSHAPE_README.html#active_queue">active queue</a>, the
+rest will have to wait until (some of) the messages in the active
+queue are completely delivered (or deferred).
+
+</p>
+
+<p>
+
+Even with the limited amount of in-core messages, there is another
+limit which must be imposed in order to avoid memory exhaustion.
+Each message may contain huge amount of recipients (tens or hundreds
+of thousands are not uncommon), so if <tt>nqmgr</tt> read all
+recipients of all messages in the <a href="QSHAPE_README.html#active_queue">active queue</a>, it may easily run
+out of memory. Therefore there must be some upper bound on the
+amount of message recipients which are read into the memory at the
+same time.
+
+</p>
+
+<p>
+
+Before discussing how exactly <tt>nqmgr</tt> implements the recipient
+limits, let's see how the sole existence of the limits themselves
+affects the <tt>nqmgr</tt> and its scheduler.
+
+</p>
+
+<p>
+
+The message limit is straightforward - it just limits the size of
+the
+lookahead the <tt>nqmgr</tt>'s scheduler has when choosing which
+message can preempt the current one. Messages not in the active
+queue simply are not considered at all.
+
+</p>
+
+<p>
+
+The recipient limit complicates more things. First of all, the
+message reading code must support reading the recipients in batches,
+which among other things means accessing the queue file several
+times and continuing where the last recipient batch ended. This is
+invoked by the scheduler whenever the current job runs out of in-core
+recipients and more are required. It is also done any time when all
+in-core recipients of the message are dealt with (which may also
+mean they were deferred) but there are still more in the queue file.
+
+</p>
+
+<p>
+
+The second complication is that with some recipients left unread
+in the queue file, the scheduler can't operate with exact counts
+of recipient entries. With unread recipients, it is not clear how
+many recipient entries there will be, as they are subject to
+per-destination grouping. It is not even clear to what transports
+(and thus jobs) the recipients will be assigned. And with messages
+coming from the <a href="QSHAPE_README.html#deferred_queue">deferred queue</a>, it is not even clear how many unread
+recipients are still to be delivered. This all means that the
+scheduler must use only estimates of how many recipients entries
+there will be. Fortunately, it is possible to estimate the minimum
+and maximum correctly, so the scheduler can always err on the safe
+side. Obviously, the better the estimates, the better results, so
+it is best when we are able to read all recipients in-core and turn
+the estimates into exact counts, or at least try to read as many
+as possible to make the estimates as accurate as possible.
+
+</p>
+
+<p>
+
+The third complication is that it is no longer true that the scheduler
+is done with a job once all of its in-core recipients are delivered.
+It is possible that the job will be revived later, when another
+batch of recipients is read in core. It is also possible that some
+jobs will be created for the first time long after the first batch
+of recipients was read in core. The <tt>nqmgr</tt> code must be
+ready to handle all such situations.
+
+</p>
+
+<p>
+
+And finally, the fourth complication is that the <tt>nqmgr</tt>
+code must somehow impose the recipient limit itself. Now how does
+it achieve it?
+
+</p>
+
+<p>
+
+Perhaps the easiest solution would be to say that each message may
+have at maximum X recipients stored in-core, but such solution would
+be poor for several reasons. With reasonable <a href="postconf.5.html#qmgr_message_active_limit">qmgr_message_active_limit</a>
+values, the X would have to be quite low to maintain reasonable
+memory footprint. And with low X lots of things would not work well.
+The <tt>nqmgr</tt> would have problems to use the
+<a href="postconf.5.html#transport_destination_recipient_limit"><i>transport</i>_destination_recipient_limit</a> efficiently. The
+scheduler's preemption would be suboptimal as the recipient count
+estimates would be inaccurate. The message queue file would have
+to be accessed many times to read in more recipients again and
+again.
+
+</p>
+
+<p>
+
+Therefore it seems reasonable to have a solution which does not use
+a limit imposed on per-message basis, but which maintains a pool
+of available recipient slots, which can be shared among all messages
+in the most efficient manner. And as we do not want separate
+transports to compete for resources whenever possible, it seems
+appropriate to maintain such recipient pool for each transport
+separately. This is the general idea, now how does it work in
+practice?
+
+</p>
+
+<p>
+
+First we have to solve little chicken-and-egg problem. If we want
+to use the per-transport recipient pools, we first need to know to
+what transport(s) is the message assigned. But we will find that
+out only after we read in the recipients first. So it is obvious
+that we first have to read in some recipients, use them to find out
+to what transports is the message to be assigned, and only after
+that we can use the per-transport recipient pools.
+
+</p>
+
+<p>
+
+Now how many recipients shall we read for the first time? This is
+what <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> and <a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a>
+values control. The <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> value specifies
+how many recipients of each message we will read for the first time,
+no matter what. It is necessary to read at least one recipients
+before we can assign the message to a transport and create the first
+job. However, reading only <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> recipients
+even if there are only few messages with few messages in-core would
+be wasteful. Therefore if there is less than <a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a>
+recipients in-core so far, the first batch of recipients may be
+larger than <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> - as large as is required
+to reach the <a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a> limit.
+
+</p>
+
+<p>
+
+Once the first batch of recipients was read in core and the message
+jobs were created, the size of the subsequent recipient batches (if
+any - of course it's best when all recipients are read in one batch)
+is based solely on the position of the message jobs on their
+corresponding transport's job lists. Each transport has a pool of
+<a href="postconf.5.html#transport_recipient_limit"><i>transport</i>_recipient_limit</a> recipient slots which it can
+distribute among its jobs (how this is done is described later).
+The subsequent recipient batch may be as large as the sum of all
+recipient slots of all jobs of the message permits (plus the
+<a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> amount which always applies).
+
+</p>
+
+<p>
+
+For example, if a message has three jobs, first with 1 recipient
+still in-core and 4 recipient slots, second with 5 recipient in-core
+and 5 recipient slots, and third with 2 recipients in-core and 0
+recipient slots, it has 1+5+2=7 recipients in-core and 4+5+0=9 jobs'
+recipients slots in total. This means that we could immediately
+read 2+<a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> more recipients of that message
+in core.
+
+</p>
+
+<p>
+
+The above example illustrates several things which might be worth
+mentioning explicitly: first, note that although the per-transport
+slots are assigned to particular jobs, we can't guarantee that once
+the next batch of recipients is read in core, that the corresponding
+amounts of recipients will be assigned to those jobs. The jobs lend
+its slots to the message as a whole, so it is possible that some
+jobs end up sponsoring other jobs of their message. For example,
+if in the example above the 2 newly read recipients were assigned
+to the second job, the first job sponsored the second job with 2
+slots. The second notable thing is the third job, which has more
+recipients in-core than it has slots. Apart from sponsoring by other
+job we just saw it can be result of the first recipient batch, which
+is sponsored from global recipient pool of <a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a>
+recipients. It can be also sponsored from the message recipient
+pool of <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> recipients.
+
+</p>
+
+<p>
+
+Now how does each transport distribute the recipient slots among
+its jobs? The strategy is quite simple. As most scheduler activity
+happens on the head of the job list, it is our intention to make
+sure that the scheduler has the best estimates of the recipient
+counts for those jobs. As we mentioned above, this means that we
+want to try to make sure that the messages of those jobs have all
+recipients read in-core. Therefore the transport distributes the
+slots "along" the job list from start to end. In this case the job
+list sorted by message enqueue time is used, because it doesn't
+change over time as the scheduler's job list does.
+
+</p>
+
+<p>
+
+More specifically, each time a job is created and appended to the
+job list, it gets all unused recipient slots from its transport's
+pool. It keeps them until all recipients of its message are read.
+When this happens, all unused recipient slots are transferred to
+the next job (which is now in fact now first such job) on the job
+list which still has some recipients unread, or eventually back to
+the transport pool if there is no such job. Such transfer then also
+happens whenever a recipient entry of that job is delivered.
+
+</p>
+
+<p>
+
+There is also a scenario when a job is not appended to the end of
+the job list (for example it was created as a result of second or
+later recipient batch). Then it works exactly as above, except that
+if it was put in front of the first unread job (that is, the job
+of a message which still has some unread recipients in queue file),
+that job is first forced to return all of its unused recipient slots
+to the transport pool.
+
+</p>
+
+<p>
+
+The algorithm just described leads to the following state: The first
+unread job on the job list always gets all the remaining recipient
+slots of that transport (if there are any). The jobs queued before
+this job are completely read (that is, all recipients of their
+message were already read in core) and have at maximum as many slots
+as they still have recipients in-core (the maximum is there because
+of the sponsoring mentioned before) and the jobs after this job get
+nothing from the transport recipient pool (unless they got something
+before and then the first unread job was created and enqueued in
+front of them later - in such case the also get at maximum as many
+slots as they have recipients in-core).
+
+</p>
+
+<p>
+
+Things work fine in such state for most of the time, because the
+current job is either completely read in-core or has as much recipient
+slots as there are, but there is one situation which we still have
+to take care specially. Imagine if the current job is preempted
+by some unread job from the job list and there are no more recipient
+slots available, so this new current job could read only batches
+of <a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> recipients at a time. This would
+really degrade performance. For this reason, each transport has
+extra pool of <a href="postconf.5.html#transport_extra_recipient_limit"><i>transport</i>_extra_recipient_limit</a> recipient
+slots, dedicated exactly for this situation. Each time an unread
+job preempts the current job, it gets half of the remaining recipient
+slots from the normal pool and this extra pool.
+
+</p>
+
+<p>
+
+And that's it. It sure does sound pretty complicated, but fortunately
+most people don't really have to care how exactly it works as long
+as it works. Perhaps the only important things to know for most
+people ire the following upper bound formulas:
+
+</p>
+
+<p>
+
+Each transport has at maximum
+
+</p>
+
+<blockquote>
+<pre>
+max(
+<a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> * <a href="postconf.5.html#qmgr_message_active_limit">qmgr_message_active_limit</a>
++ *_recipient_limit + *_extra_recipient_limit,
+<a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a>
+)
+</pre>
+</blockquote>
+
+<p>
+
+recipients in core.
+
+</p>
+
+<p>
+
+The total amount of recipients in core is
+
+</p>
+
+<blockquote>
+<pre>
+max(
+<a href="postconf.5.html#qmgr_message_recipient_minimum">qmgr_message_recipient_minimum</a> * <a href="postconf.5.html#qmgr_message_active_limit">qmgr_message_active_limit</a>
++ sum( *_recipient_limit + *_extra_recipient_limit ),
+<a href="postconf.5.html#qmgr_message_recipient_limit">qmgr_message_recipient_limit</a>
+)
+</pre>
+</blockquote>
+
+<p>
+
+where the sum is over all used transports.
+
+</p>
+
+<p>
+
+And this terribly complicated chapter concludes the documentation
+of <tt>nqmgr</tt> scheduler.
+
+</p>
+
+<p>
+
+[By now you should theoretically know the <tt>nqmgr</tt> scheduler
+inside out. In practice, you still hope that you will never have
+to really understand the last or last two chapters completely, and
+fortunately most people really won't. Understanding how the scheduler
+works in ideal conditions is more than good enough for vast majority
+of users.]
+
+</p>
<h2> <a name="credits"> Credits </a> </h2>
<li> These simplifications, and their modular implementation, helped
to develop further insights into the different roles that positive
-and negative concurrency feedback play, and helped to avoid all the
-known worst-case scenarios.
+and negative concurrency feedback play, and helped to identify some
+worst-case scenarios.
</ul>
/etc/postfix/<a href="postconf.5.html">main.cf</a>:
<a href="postconf.5.html#smtp_tls_CAfile">smtp_tls_CAfile</a> = /etc/postfix/cacert.pem
<a href="postconf.5.html#smtp_tls_session_cache_database">smtp_tls_session_cache_database</a> =
- btree:/var/spool/postfix/smtp_tls_session_cache
+ btree:/var/lib/postfix/smtp_tls_session_cache
<a href="postconf.5.html#smtp_use_tls">smtp_use_tls</a> = yes
<a href="postconf.5.html#smtpd_tls_CAfile">smtpd_tls_CAfile</a> = /etc/postfix/cacert.pem
<a href="postconf.5.html#smtpd_tls_cert_file">smtpd_tls_cert_file</a> = /etc/postfix/FOO-cert.pem
<a href="postconf.5.html#smtpd_tls_key_file">smtpd_tls_key_file</a> = /etc/postfix/FOO-key.pem
<a href="postconf.5.html#smtpd_tls_received_header">smtpd_tls_received_header</a> = yes
<a href="postconf.5.html#smtpd_tls_session_cache_database">smtpd_tls_session_cache_database</a> =
- btree:/var/spool/postfix/smtpd_tls_session_cache
+ btree:/var/lib/postfix/smtpd_tls_session_cache
<a href="postconf.5.html#tls_random_source">tls_random_source</a> = dev:/dev/urandom
# Postfix 2.3 and later
<a href="postconf.5.html#smtpd_tls_security_level">smtpd_tls_security_level</a> = may
<i>number</i> equal to "1", a destination's delivery concurrency
is decremented by 1 after each failed pseudo-cohort. </dd>
-<dt> <b><i>number</i> / sqrt_concurrency </b> </dt>
-
-<dd> Variable feedback of "<i>number</i> / sqrt(delivery concurrency)".
-The <i>number</i> must be in the range 0..1 inclusive. This setting
-may be removed in a future version. </dd>
-
</dl>
<p> A pseudo-cohort is the number of deliveries equal to a destination's
<i>number</i> equal to "1", a destination's delivery concurrency
is incremented by 1 after each successful pseudo-cohort. </dd>
-<dt> <b><i>number</i> / sqrt_concurrency </b> </dt>
-
-<dd> Variable feedback of "<i>number</i> / sqrt(delivery concurrency)".
-The <i>number</i> must be in the range 0..1 inclusive. This setting
-may be removed in a future version. </dd>
-
</dl>
<p> A pseudo-cohort is the number of deliveries equal to a destination's
The \fInumber\fR must be in the range 0..1 inclusive. With
\fInumber\fR equal to "1", a destination's delivery concurrency
is decremented by 1 after each failed pseudo-cohort.
-.IP "\fB\fInumber\fR / sqrt_concurrency \fR"
-Variable feedback of "\fInumber\fR / sqrt(delivery concurrency)".
-The \fInumber\fR must be in the range 0..1 inclusive. This setting
-may be removed in a future version.
.PP
A pseudo-cohort is the number of deliveries equal to a destination's
delivery concurrency.
The \fInumber\fR must be in the range 0..1 inclusive. With
\fInumber\fR equal to "1", a destination's delivery concurrency
is incremented by 1 after each successful pseudo-cohort.
-.IP "\fB\fInumber\fR / sqrt_concurrency \fR"
-Variable feedback of "\fInumber\fR / sqrt(delivery concurrency)".
-The \fInumber\fR must be in the range 0..1 inclusive. This setting
-may be removed in a future version.
.PP
A pseudo-cohort is the number of deliveries equal to a destination's
delivery concurrency.
# - Process input as text blocks separated by one or more empty
# (or all whitespace) lines.
#
+# - Skip text between <!-- and -->; each must be on a different line.
+#
# - Don't touch blocks that start with `<' in column zero.
#
# The only changes made are:
# Gobble up the next text block.
$block = "";
+ $comment = 0;
do {
$_ =~ s/\s+\n$/\n/;
$block .= $_;
- } while(($_ = <>) && /\S/);
+ if ($_ =~ /<!--/)
+ { $comment = 1; }
+ if ($comment && $_ =~ /-->/)
+ { $comment = 0; $block =~ s/<!--.*-->//sg; }
+ } while((($_ = <>) && /\S/) || $comment);
+
+ # Skip blanks after comment elimination.
+ if ($block =~ /^\s/) {
+ $block =~ s/^\s+//s;
+ next if ($block eq "");
+ }
# Don't touch a text block starting with < in column zero.
if ($block =~ /^</) {
<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
- "http://www.w3.org/TR/html4/loose.dtd">
+ "http://www.w3.org/TR/html4/loose.dtd">
<html>
<body>
-<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Queue Scheduler</h1>
+<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix
+Queue Scheduler</h1>
<hr>
the last delivery attempt. There are two major classes of mechanisms
that control the operation of the queue manager. </p>
-<p> The first class of mechanisms is concerned with the number of
-concurrent deliveries to a specific destination, including decisions
-on when to suspend deliveries after persistent failures: </p>
-
- <ul>
-
- <li> <a href="#concurrency"> Concurrency scheduling </a>
-
- <ul>
-
- <li> <a href="#concurrency_summary_2_5"> Summary of the
- Postfix 2.5 concurrency feedback algorithm </a>
-
- <li> <a href="#dead_summary_2_5"> Summary of the Postfix
- 2.5 "dead destination" detection algorithm </a>
-
- <li> <a href="#pseudo_code_2_5"> Pseudocode for the Postfix
- 2.5 concurrency scheduler </a>
+<ul>
- <li> <a href="#concurrency_results"> Results for delivery
- to concurrency limited servers </a>
+<li> <a href="#concurrency"> Concurrency scheduling </a> is concerned
+with the number of concurrent deliveries to a specific destination,
+including decisions on when to suspend deliveries after persistent
+failures.
- <li> <a href="#concurrency_discussion"> Discussion of
- concurrency limited server results </a>
+<li> <a href="#jobs"> Preemptive scheduling </a> is concerned with
+the selection of email messages and recipients for a given destination.
- <li> <a href="#concurrency_limitations"> Limitations of
- less-than-1 per delivery feedback </a>
+<li> <a href="#credits"> Credits </a>. This document would not be
+complete without.
- <li> <a href="#concurrency_config"> Concurrency configuration
- parameters </a>
+</ul>
- </ul>
+<!--
- </ul>
+<p> Once started, the qmgr(8) process runs until "postfix reload"
+or "postfix stop". As a persistent process, the queue manager has
+to meet strict requirements with respect to code correctness and
+robustness. Unlike non-persistent daemon processes, the queue manager
+cannot benefit from Postfix's process rejuvenation mechanism that
+limit the impact from resource leaks and other coding errors
+(translation: replacing a process after a short time covers up bugs
+before they can become a problem). </p>
-<p> The second class of mechanisms is concerned with the selection
-of what mail to deliver to a given destination: </p>
+-->
- <ul>
+<h2> <a name="concurrency"> Concurrency scheduling </a> </h2>
- <li> <a href="#jobs"> Preemptive scheduling </a>
+<p> The following sections document the Postfix 2.5 concurrency
+scheduler, after a discussion of the limitations of the existing
+concurrency scheduler. This is followed by results of medium-concurrency
+experiments, and a discussion of trade-offs between performance and
+robustness. </p>
- <ul>
+<p> The material is organized as follows: </p>
- <li> <a href="#job_motivation"> Why the non-preemptive Postfix queue
- manager was replaced </a>
+<ul>
- <li> <a href="#job_design"> How the non-preemptive queue manager
- scheduler works </a>
+<li> <a href="#concurrency_drawbacks"> Drawbacks of the existing
+concurrency scheduler </a>
- </ul>
+<li> <a href="#concurrency_summary_2_5"> Summary of the Postfix 2.5
+concurrency feedback algorithm </a>
- </ul>
+<li> <a href="#dead_summary_2_5"> Summary of the Postfix 2.5 "dead
+destination" detection algorithm </a>
-<p> And this document would not be complete without: </p>
+<li> <a href="#pseudo_code_2_5"> Pseudocode for the Postfix 2.5
+concurrency scheduler </a>
- <ul>
+<li> <a href="#concurrency_results"> Results for delivery to
+concurrency limited servers </a>
- <li> <a href="#credits"> Credits </a>
+<li> <a href="#concurrency_discussion"> Discussion of concurrency
+limited server results </a>
- </ul>
+<li> <a href="#concurrency_limitations"> Limitations of less-than-1
+per delivery feedback </a>
-<!--
+<li> <a href="#concurrency_config"> Concurrency configuration
+parameters </a>
-<p> Once started, the qmgr(8) process runs until "postfix reload"
-or "postfix stop". As a persistent process, the queue manager has
-to meet strict requirements with respect to code correctness and
-robustness. Unlike non-persistent daemon processes, the queue manager
-cannot benefit from Postfix's process rejuvenation mechanism that
-limit the impact from resource leaks and other coding errors
-(translation: replacing a process after a short time covers up bugs
-before they can become a problem). </p>
-
--->
+</ul>
-<h2> <a name="concurrency"> Concurrency scheduling </a> </h2>
+<h3> <a name="concurrency_drawbacks"> Drawbacks of the existing
+concurrency scheduler </a> </h3>
-<p> This section documents the Postfix 2.5 concurrency scheduler.
-Prior Postfix versions used a simple but robust algorithm where the
-per-destination delivery concurrency was decremented by 1 after a
-delivery suffered connection or handshake failure, and was incremented
-by 1 otherwise. Of course the concurrency was never allowed to
-exceed the maximum per-destination concurrency limit. And when a
-destination's concurrency level dropped to zero, the destination
-was declared "dead" and delivery was suspended. </p>
+<p> From the start, Postfix has used a simple but robust algorithm
+where the per-destination delivery concurrency is decremented by 1
+after a delivery suffered connection or handshake failure, and
+incremented by 1 otherwise. Of course the concurrency is never
+allowed to exceed the maximum per-destination concurrency limit.
+And when a destination's concurrency level drops to zero, the
+destination is declared "dead" and delivery is suspended. </p>
-<p> Drawbacks of the old +/-1 feedback per delivery are: <p>
+<p> Drawbacks of +/-1 concurrency feedback per delivery are: <p>
<ul>
<li> <p> Overshoot due to exponential delivery concurrency growth
-with each pseudo-cohort(*). For example, with the default initial
-concurrency of 5, concurrency would proceed over time as (5-10-20).
-</p>
+with each pseudo-cohort(*). This can be an issue with high-concurrency
+channels. For example, with the default initial concurrency of 5,
+concurrency would proceed over time as (5-10-20). </p>
<li> <p> Throttling down to zero concurrency after a single
pseudo-cohort(*) failure. This was especially an issue with
configurable number of pseudo-cohorts reports connection or handshake
failure. </p>
-<h3> <a name="concurrency_summary_2_5"> Summary of the Postfix 2.5 concurrency feedback algorithm </a> </h3>
+<h3> <a name="concurrency_summary_2_5"> Summary of the Postfix 2.5
+concurrency feedback algorithm </a> </h3>
<p> We want to increment a destination's delivery concurrency when
some (not necessarily consecutive) number of deliveries complete
<p> All results in the previous sections are based on the first
delivery runs only; they do not include any second etc. delivery
-attempts. The first two examples show that the feedback method
-matters little when concurrency is limited due to congestion. This
+attempts. The first two examples show that the effect of feedback
+is negligible when concurrency is limited due to congestion. This
is because the initial concurrency is already at the client's
concurrency maximum, and because there is 10-100 times more positive
-than negative feedback. Under these conditions, the contribution
-from SMTP connection caching is negligible. </p>
+than negative feedback. Under these conditions, it is no surprise
+that the contribution from SMTP connection caching is also negligible.
+</p>
<p> In the last example, the old +/-1 feedback per delivery will
defer 50% of the mail when confronted with an active (anvil-style)
<h2> <a name="jobs"> Preemptive scheduling </a> </h2>
-<p> This is the beginning of documentation for a preemptive queue
-manager scheduling algorithm by Patrik Rak. For a long time, this
-code was made available under the name "nqmgr(8)" (new queue manager),
-as an optional module. As of Postfix 2.1 this is the default queue
-manager, which is always called "qmgr(8)". The old queue manager
-will for some time will be available under the name of "oqmgr(8)".
+<p>
+
+This document attempts to describe the new queue manager and its
+preeemptive scheduler algorithm. Note that the document was originally
+written to describe the changes between the new queue manager (in
+this text referred to as <tt>nqmgr</tt>, the name it was known by
+before it became the default queue manager) and the old queue manager
+(referred to as <tt>oqmgr</tt>). This is why it refers to <tt>oqmgr</tt>
+every so often.
+
</p>
-<h3> <a name="job_motivation"> Why the non-preemptive Postfix queue manager was replaced </a> </h3>
+<p>
-<p> The non-preemptive Postfix scheduler had several limitations
-due to unfortunate choices in its design. </p>
+This document is divided into sections as follows:
-<ol>
+</p>
- <li> <p> Round-robin selection by destination for mail that is
- delivered via the same message delivery transport. The round-robin
- strategy was chosen with the intention to prevent a single
- (destination) site from using up too many mail delivery resources.
- However, that strategy penalized inbound mail on bi-directional
- gateways. The poor suffering inbound destination would be
- selected only 1/number-of-destinations of the time, even when
- it had more mail than other destinations, and thus mail could
- be delayed. </p>
-
- <p> Victor Duchovni found a workaround: use different message
- delivery transports, and thus avoid the starvation problem.
- The Patrik Rak scheduler solves this problem by using FIFO
- selection. </p>
-
- <li> <p> A second limitation of the old Postfix scheduler was
- that delivery of bulk mail would block all other deliveries,
- causing large delays. Patrik Rak's scheduler allows mail with
- fewer recipients to slip past bulk mail in an elegant manner.
- </p>
+<ul>
-</ol>
+<li> <a href="#<tt>nqmgr</tt>_structures"> The structures used by
+nqmgr </a>
+
+<li> <a href="#<tt>nqmgr</tt>_pickup"> What happens when nqmgr picks
+up the message </a> - how it is assigned to transports, jobs, peers,
+entries
+
+<li> <a href="#<tt>nqmgr</tt>_selection"> How does the entry selection
+work </a>
+
+<li> <a href="#<tt>nqmgr</tt>_preemption"> How does the preemption
+work </a> - what messages may be preempted and how and what messages
+are chosen to preempt them
+
+<li> <a href="#<tt>nqmgr</tt>_concurrency"> How destination concurrency
+limits affect the scheduling algorithm </a>
+
+<li> <a href="#<tt>nqmgr</tt>_memory"> Dealing with memory resource
+limits </a>
+
+</ul>
+
+<h3> <a name="<tt>nqmgr</tt>_structures"> The structures used by
+nqmgr </a> </h3>
+
+<p>
+
+Let's start by recapitulating the structures and terms used when
+referring to queue manager and how it operates. Many of these are
+partially described elsewhere, but it is nice to have a coherent
+overview in one place:
+
+</p>
+
+<ul>
+
+<li> <p> Each message structure represents one mail message which
+Postfix is to deliver. The message recipients specify to what
+destinations is the message to be delivered and what transports are
+going to be used for the delivery. </p>
+
+<li> <p> Each recipient entry groups a batch of recipients of one
+message which are all going to be delivered to the same destination.
+</p>
+
+<li> <p> Each transport structure groups everything what is going
+to be delivered by delivery agents dedicated for that transport.
+Each transport maintains a set of queues (describing the destinations
+it shall talk to) and jobs (referencing the messages it shall
+deliver). </p>
+
+<li> <p> Each transport queue (not to be confused with the on-disk
+active queue or incoming queue) groups everything what is going be
+delivered to given destination (aka nexthop) by its transport. Each
+queue belongs to one transport, so each destination may be referred
+to by several queues, one for each transport. Each queue maintains
+a list of all recipient entries (batches of message recipients)
+which shall be delivered to given destination (the todo list), and
+a list of recipient entries already being delivered by the delivery
+agents (the busy list). </p>
+
+<li> <p> Each queue corresponds to multiple peer structures. Each
+peer structure is like the queue structure, belonging to one transport
+and referencing one destination. The difference is that it lists
+only the recipient entries which all originate from the same message,
+unlike the queue structure, whose entries may originate from various
+messages. For messages with few recipients, there is usually just
+one recipient entry for each destination, resulting in one recipient
+entry per peer. But for large mailing list messages the recipients
+may need to be split to multiple recipient entries, in which case
+the peer structure may list many entries for single destination.
+</p>
+
+<li> <p> Each transport job groups everything it takes to deliver
+one message via its transport. Each job represents one message
+within the context of the transport. The job belongs to one transport
+and message, so each message may have multiple jobs, one for each
+transport. The job groups all the peer structures, which describe
+the destinations the job's message has to be delivered to. </p>
+
+</ul>
+
+<p>
+
+The first four structures are common to both <tt>nqmgr</tt> and
+<tt>oqmgr</tt>, the latter two were introduced by <tt>nqmgr</tt>.
+
+</p>
+
+<p>
+
+These terms are used extensively in the text below, feel free to
+look up the description above anytime you'll feel you have lost a
+sense what is what.
+
+</p>
+
+<h3> <a name="<tt>nqmgr</tt>_pickup"> What happens when nqmgr picks
+up the message </a> </h3>
+
+<p>
+
+Whenever <tt>nqmgr</tt> moves a queue file into the active queue,
+the following happens: It reads all necessary information from the
+queue file as <tt>oqmgr</tt> does, and also reads as many recipients
+as possible - more on that later, for now let's just pretend it
+always reads all recipients.
+
+</p>
+
+<p>
+
+Then it resolves the recipients as <tt>oqmgr</tt> does, which
+means obtaining (address, nexthop, transport) triple for each
+recipient. For each triple, it finds the transport; if it does not
+exist yet, it instantiates it (unless it's dead). Within the
+transport, it finds the destination queue for given nexthop; if it
+does not exist yet, it instantiates it (unless it's dead). The
+triple is then bound to given destination queue. This happens in
+qmgr_resolve() and is basically the same as in <tt>oqmgr</tt>.
+
+</p>
+
+<p>
+
+Then for each triple which was bound to some queue (and thus
+transport), the program finds the job which represents the message
+within that transport's context; if it does not exist yet, it
+instantiates it. Within the job, it finds the peer which represents
+the bound destination queue within this jobs context; if it does
+not exist yet, it instantiates it. Finally, it stores the address
+from the resolved triple to the recipient entry which is appended
+to both the queue entry list and the peer entry list. The addresses
+for same nexthop are batched in the entries up to recipient_concurrency
+limit for that transport. This happens in qmgr_assign() and apart
+from that it operates with job and peer structures is basically the
+same as in <tt>oqmgr</tt>.
+
+</p>
+
+<p>
+
+When the job is instantiated, it is enqueued on the transport's job
+list based on the time its message was picked up by <tt>nqmgr</tt>.
+For first batch of recipients this means it is appended to the end
+of the job list, but the ordering of the job list by the enqueue
+time is important as we will see shortly.
+
+</p>
+
+<p>
+
+[Now you should have pretty good idea what is the state of the
+<tt>nqmgr</tt> after couple of messages was picked up, what is the
+relation between all those job, peer, queue and entry structures.]
+
+</p>
+
+<h3> <a name="<tt>nqmgr</tt>_selection"> How does the entry selection
+work </a> </h3>
+
+<p>
+
+Having prepared all those above mentioned structures, the task of
+the <tt>nqmgr</tt>'s scheduler is to choose the recipient entries
+one at a time and pass them to the delivery agent for corresponding
+transport. Now how does this work?
+
+</p>
+
+<p>
+
+The first approximation of the new scheduling algorithm is like this:
+
+</p>
+
+<blockquote>
+<pre>
+foreach transport (round-robin-by-transport)
+do
+ if transport busy continue
+ if transport process limit reached continue
+ foreach transport's job (in the order of the transport's job list)
+ do
+ foreach job's peer (round-robin-by-destination)
+ if peer->queue->concurrency < peer->queue->window
+ return next peer entry.
+ done
+ done
+done
+</pre>
+</blockquote>
+
+<p>
+
+Now what is the "order of the transport's job list"? As we know
+already, the job list is by default kept in the order the message
+was picked up by the <tt>nqmgr</tt>. So by default we get the
+top-level round-robin transport, and within each transport we get
+the FIFO message delivery. The round-robin of the peers by the
+destination is perhaps of little importance in most real-life cases
+(unless the recipient_concurrency limit is reached, in one job there
+is only one peer structure for each destination), but theoretically
+it makes sure that even within single jobs, destinations are treated
+fairly.
+
+</p>
+
+<p>
+
+[By now you should have a feeling you really know how the scheduler
+works, except for the preemption, under ideal conditions - that is,
+no recipient resource limits and no destination concurrency problems.]
+
+</p>
+
+<h3> <a name="<tt>nqmgr</tt>_preemption"> How does the preemption
+work </a> </h3>
+
+<p>
+
+As you might perhaps expect by now, the transport's job list does
+not remain sorted by the job's message enqueue time all the time.
+The most cool thing about <tt>nqmgr</tt> is not the simple FIFO
+delivery, but that it is able to slip mail with little recipients
+past the mailing-list bulk mail. This is what the job preemption
+is about - shuffling the jobs on the transport's job list to get
+the best message delivery rates. Now how is it achieved?
+
+</p>
+
+<p>
+
+First I have to tell you that there are in fact two job lists in
+each transport. One is the scheduler's job list, which the scheduler
+is free to play with, while the other one keeps the jobs always
+listed in the order of the enqueue time and is used for recipient
+pool management we will discuss later. For now, we will deal with
+the scheduler's job list only.
+
+</p>
+
+<p>
+
+So, we have the job list, which is first ordered by the time the
+job's messages were enqueued, oldest messages first, the most recently
+picked one at the end. For now, let's assume that there are no
+destination concurrency problems. Without preemption, we pick some
+entry of the first (oldest) job on the queue, assign it to delivery
+agent, pick another one from the same job, assign it again, and so
+on, until all the entries are used and the job is delivered. We
+would then move onto the next job and so on and on. Now how do we
+manage to sneak in some entries from the recently added jobs when
+the first job on the job list belongs to a message going to the
+mailing-list and has thousands of recipient entries?
+
+</p>
+
+<p>
+
+The <tt>nqmgr</tt>'s answer is that we can artificially "inflate"
+the delivery time of that first job by some constant for free - it
+is basically the same trick you might remember as "accumulation of
+potential" from the amortized complexity lessons. For example,
+instead of delivering the entries of the first job on the job list
+every time an delivery agent becomes available, we can do it only
+every second time. If you view the moments the delivery agent becomes
+available on a timeline as "delivery slots", then instead of using
+every delivery slot for the first job, we can use only every other
+slot, and still the overall delivery efficiency of the first job
+remains the same. So the delivery <tt>11112222</tt> becomes
+<tt>1.1.1.1.2.2.2.2</tt> (1 and 2 are the imaginary job numbers, .
+denotes the free slot). Now what do we do with free slots?
+
+</p>
+
+<p>
+
+As you might have guessed, we will use them for sneaking the mail
+with little recipients in. For example, if we have one four-recipient
+mail followed by four one recipients mail, the delivery sequence
+(that is, the sequence in which the jobs are assigned to the
+delivery slots) might look like this: <tt>12131415</tt>. Hmm, fine
+for sneaking in the single recipient mail, but how do we sneak in
+the mail with more than one recipient? Say if we have one four-recipient
+mail followed by two two-recipient mails?
+
+</p>
+
+<p>
+
+The simple answer would be to use delivery sequence <tt>12121313</tt>.
+But the problem is that this does not scale well. Imagine you have
+mail with thousand recipients followed by mail with hundred recipients.
+It is tempting to suggest the delivery sequence like <tt>121212....</tt>,
+but alas! Imagine there arrives another mail with say ten recipients.
+But there are no free slots anymore, so it can't slip by, not even
+if it had just only one recipients. It will be stuck until the
+hundred-recipient mail is delivered, which really sucks.
+
+</p>
+
+<p>
+
+So, it becomes obvious that while the inflating the message to get
+free slots is great idea, one has to be really careful of how the
+free slots are assigned, otherwise one might corner himself. So,
+how does <tt>nqmgr</tt> really use the free slots?
+
+</p>
+
+<p>
+
+The key idea is that one does not have to generate the free slots
+in a uniform way. The delivery sequence <tt>111...1</tt> is no
+worse than <tt>1.1.1.1</tt>, in fact, it is even better as some
+entries are in the first case selected earlier than in the second
+case, and none is selected later! So it is possible to first to
+"accumulate" the free delivery slots and then use them all at once.
+It is even possible to accumulate some, then use them, then accumulate
+some more and use them again, as in <tt>11..1.1</tt> .
+
+</p>
+
+<p>
+
+Let's get back to the one hundred recipient example. We now know
+that we could first accumulate one hundred free slots, and only
+after then to preempt the first job and sneak the one hundred
+recipient mail in. Applying the algorithm recursively, we see the
+hundred recipient job can accumulate ten free delivery slots, and
+then we could preempt it and sneak in the ten recipient mail...
+Wait wait wait! Could we? Aren't we overinflating the original one
+thousand recipient mail?
+
+</p>
+
+<p>
+
+Well, despite it looks so at the first glance, another trick will
+allow us to answer "no, we are not!". If we had said that we will
+inflate the delivery time twice at maximum, and then we consider
+every other slot as a free slot, then we would overinflate in case
+of the recursive preemption. BUT! The trick is that if we use only
+every n-th slot as a free slot for n>2, there is always some worst
+inflation factor which we can guarantee not to be breached, even
+if we apply the algorithm recursively. To be precise, if for every
+k>1 normally used slots we accumulate one free delivery slot, than
+the inflation factor is not worse than k/(k-1) no matter how many
+recursive preemptions happen. And it's not worse than (k+1)/k if
+only non-recursive preemption happens. Now, having got through the
+theory and the related math, let's see how <tt>nqmgr</tt> implements
+this.
+
+</p>
+
+<p>
+
+Each job has so called "available delivery slot" counter. Each
+transport has a <i>transport</i>_delivery_slot_cost parameter, which
+defaults to default_delivery_slot_cost parameter which is set to 5
+by default. This is the k from the paragraph above. Each time k
+entries of the job are selected for delivery, this counter is
+incremented by one. Once there are some slots accumulated, job which
+requires no more than that amount of slots to be fully delivered
+can preempt this job.
+
+</p>
+
+<p>
+
+[Well, the truth is, the counter is incremented every time an entry
+is selected and it is divided by k when it is used. Or even more
+true, there is no division, the other side of the equation is
+multiplied by k. But for the understanding it's good enough to use
+the above approximation of the truth.]
+
+</p>
+
+<p>
+
+OK, so now we know the conditions which must be satisfied so one
+job can preempt another one. But what job gets preempted, how do
+we choose what job preempts it if there are several valid candidates,
+and when does all this exactly happen?
+
+</p>
+
+<p>
+
+The answer for the first part is simple. The job whose entry was
+selected the last time is so called current job. Normally, it is
+the first job on the scheduler's job list, but destination concurrency
+limits may change this as we will see later. It is always only the
+current job which may get preempted.
+
+</p>
+
+<p>
+
+Now for the second part. The current job has certain amount of
+recipient entries, and as such may accumulate at maximum some amount
+of available delivery slots. It might have already accumulated some,
+and perhaps even already used some when it was preempted before
+(remember a job can be preempted several times). In either case,
+we know how many are accumulated and how many are left to deliver,
+so we know how many it may yet accumulate at maximum. Every other
+job which may be delivered by less than that amount of slots is an
+valid candidate for preemption. How do we choose among them?
+
+</p>
+
+<p>
+
+The answer is - the one with maximum enqueue_time/recipient_entry_count.
+That is, the older the job is, the more we should try to deliver
+it in order to get best message delivery rates. These rates are of
+course subject to how many recipients the message has, therefore
+the division by the recipient (entry) count. No one shall be surprised
+that message with n recipients takes n times longer to deliver than
+message with one recipient.
+
+</p>
+
+<p>
+
+Now let's recap the previous two paragraphs. Isn't it too complicated?
+Why don't the candidates come only among the jobs which can be
+delivered within the amount of slots the current job already
+accumulated? Why do we need to estimate how much it has yet to
+accumulate? If you found out the answer, congratulate yourself. If
+we did it this simple way, we would always choose the candidate
+with least recipient entries. If there were enough single recipient
+mails coming in, they would always slip by the bulk mail as soon
+as possible, and the two and more recipients mail would never get
+a chance, no matter how long they have been sitting around in the
+job list.
+
+</p>
+
+<p>
+
+This candidate selection has interesting implication - that when
+we choose the best candidate for preemption (this is done in
+qmgr_choose_candidate()), it may happen that we may not use it for
+preemption immediately. This leads to an answer to the last part
+of the original question - when does the preemption happen?
+
+</p>
+
+<p>
+
+The preemption attempt happens every time next transport's recipient
+entry is to be chosen for delivery. To avoid needless overhead, the
+preemption is not attempted if the current job could never accumulate
+more than <i>transport</i>_minimum_delivery_slots (defaults to
+default_minimum_delivery_slots which defaults to 3). If there is
+already enough accumulated slots to preempt the current job by the
+chosen best candidate, it is done immediately. This basically means
+that the candidate is moved in front of the current job on the
+scheduler's job list and decreasing the accumulated slot counter
+by the amount used by the candidate. If there is not enough slots...
+well, I could say that nothing happens and the another preemption
+is attempted the next time. But that's not the complete truth.
+
+</p>
+
+<p>
+
+The truth is that it turns out that it is not really necessary to
+wait until the jobs counter accumulates all the delivery slots in
+advance. Say we have ten recipient mail followed by two two-recipient
+mails. If the preemption happened when enough delivery slot accumulate
+(assuming slot cost 2), the delivery sequence becomes
+<tt>11112211113311</tt>. Now what would we get if we would wait
+only for 50% of the necessary slots to accumulate and we promise
+we would wait for the remaining 50% later, after the we get back
+to the preempted job? If we use such slot loan, the delivery sequence
+becomes <tt>11221111331111</tt>. As we can see, it makes it no
+considerably worse for the delivery of the ten-recipient mail, but
+it allows the small messages to be delivered sooner.
+
+</p>
+
+<p>
+
+The concept of these slot loans is where the
+<i>transport</i>_delivery_slot_discount and
+<i>transport</i>_delivery_slot_loan come from (they default to
+default_delivery_slot_discount and default_delivery_slot_loan, whose
+values are by default 50 and 3, respectively). The discount (resp.
+loan) specifies how many percent (resp. how many slots) one "gets
+in advance", when the amount of slots required to deliver the best
+candidate is compared with the amount of slots the current slot had
+accumulated so far.
+
+</p>
+
+<p>
+
+And it pretty much concludes this chapter.
+
+</p>
+
+<p>
+
+[Now you should have a feeling that you pretty much understand the
+scheduler and the preemption, or at least that you will have it
+after you read the last chapter couple more times. You shall clearly
+see the job list and the preemption happening at its head, in ideal
+delivery conditions. The feeling of understanding shall last until
+you start wondering what happens if some of the jobs are blocked,
+which you might eventually figure out correctly from what had been
+said already. But I would be surprised if you mental image of the
+scheduler's functionality it is not completely shattered once you
+start wondering how it works when not all recipients may be read
+in-core. More on that later.]
+
+</p>
+
+<h3> <a name="<tt>nqmgr</tt>_concurrency"> How destination concurrency
+limits affect the scheduling algorithm </a> </h3>
+
+<p>
+
+The <tt>nqmgr</tt> uses the same algorithm for destination concurrency
+control as <tt>oqmgr</tt>. Now what happens when the destination
+limits are reached and no more entries for that destination may be
+selected by the scheduler?
+
+</p>
+
+<p>
+
+From user's point of view it is all simple. If some of the peers
+of a job can't be selected, those peers are simply skipped by the
+entry selection algorithm (the pseudo-code described before) and
+only the selectable ones are used. If none of the peers may be
+selected, the job is declared a "blocker job". Blocker jobs are
+skipped by the entry selection algorithm and they are also excluded
+from the candidates for preemption of current job. Thus the scheduler
+effectively behaves as if the blocker jobs didn't exist on the job
+list at all. As soon as at least one of the peers of a blocker job
+becomes unblocked (that is, the delivery agent handling the delivery
+of the recipient entry for given destination successfully finishes),
+the job's blocker status is removed and the job again participates
+in all further scheduler actions normally.
+
+</p>
+
+<p>
+
+So the summary is that the user's don't really have to be concerned
+about the interaction of the destination limits and scheduling
+algorithm. It works well on its own and there are no knobs they
+would need to control it.
+
+</p>
+
+<p>
+
+From a programmer's point of view, the blocker jobs complicate the
+scheduler quite a lot. Without them, the jobs on the job list would
+be normally delivered in strict FIFO order. If the current job is
+preempted, the job preempting it is completely delivered unless it
+is preempted itself. Without blockers, the current job is thus
+always either the first job on the job list, or the top of the stack
+of jobs preempting the first job on the job list.
+
+</p>
+
+<p>
+
+The visualization of the job list and the preemption stack without
+blockers would be like this:
+
+</p>
+
+<blockquote>
+<pre>
+first job-> 1--2--3--5--6--8--... <- job list
+on job list |
+ 4 <- preemption stack
+ |
+current job-> 7
+</pre>
+</blockquote>
-<h3> <a name="job_design"> How the non-preemptive queue manager scheduler works </a> </h3>
+<p>
-<p> The following text is from Patrik Rak and should be read together
-with the postconf(5) manual that describes each configuration
-parameter in detail. </p>
+In the example above we see that job 1 was preempted by job 4 and
+then job 4 was preempted by job 7. After job 7 is completed, remaining
+entries of job 4 are selected, and once they are all selected, job
+1 continues.
-<p> From user's point of view, oqmgr(8) and qmgr(8) are both the same,
-except for how next message is chosen when delivery agent becomes
-available. You already know that oqmgr(8) uses round-robin by destination
-while qmgr(8) uses simple FIFO, except for some preemptive magic.
-The postconf(5) manual documents all the knobs the user
-can use to control this preemptive magic - there is nothing else
-to the preemption than the quite simple conditions described in there.
</p>
-<p> As for programmer-level documentation, this will have to be
-extracted from all those emails we have exchanged with Wietse [rats!
-I hoped that Patrik would do the work for me -- Wietse] But I think
-there are no missing bits which we have not mentioned in our
-conversations. </p>
+<p>
-<p> However, even from programmer's point of view, there is nothing
-more to add to the message scheduling idea itself. There are few
-things which make it look more complicated than it is, but the
-algorithm is the same as the user perceives it. The summary of the
-differences of the programmer's view from the user's view are: </p>
+As we see, it's all very clean and straightforward. Now how does
+this change because of blockers?
+
+</p>
+
+<p>
+
+The answer is: a lot. Any job may become blocker job at any time,
+and also become normal job again at any time. This has several
+important implications:
+
+</p>
<ol>
- <li> <p> Simplification of terms for users: The user knows
- about messages and recipients. The program itself works with
- jobs (one message is split among several jobs, one per each
- transport needed to deliver the message) and queue entries
- (each entry may group several recipients for same destination).
- Then there is the peer structure introduced by qmgr(8) which is
- simply per-job analog of the queue structure. </p>
-
- <li> <p> Dealing with concurrency limits: The actual implementation
- is complicated by the fact that the messages (resp. jobs) may
- not be delivered in the exactly scheduled order because of the
- concurrency limits. It is necessary to skip some "blocker" jobs
- when the concurrency limit is reached and get back to them
- again when the limit permits. </p>
-
- <li> <p> Dealing with resource limits: The actual implementation is
- complicated by the fact that not all recipients may be read in-core.
- Therefore each message has some recipients in-core and some may
- remain on-file. This means that a) the preemptive algorithm needs
- to work with recipient count estimates instead of exact counts, b)
- there is extra code which needs to manipulate the per-transport
- pool of recipients which may be read in-core at the same time, and
- c) there is extra code which needs to be able to read recipients
- into core in batches and which is triggered at appropriate moments. </p>
-
- <li> <p> Doing things efficiently: All important things I am
- aware of are done in the minimum time possible (either directly
- or at least when amortized complexity is used), but to choose
- which job is the best candidate for preempting the current job
- requires linear search of up to all transport jobs (the worst
- theoretical case - the reality is much better). As this is done
- every time the next queue entry to be delivered is about to be
- chosen, it seemed reasonable to add cache which minimizes the
- overhead. Maintenance of this candidate cache slightly obfuscates
- things.
+<li> <p>
+
+The jobs may be completed in arbitrary order. For example, in the
+example above, if the current job 7 becomes blocked, the next job
+4 may complete before the job 7 becomes unblocked again. Or if both
+7 and 4 are blocked, then 1 is completed, then 7 becomes unblocked
+and is completed, then 2 is completed and only after that 4 becomes
+unblocked and is completed... You get the idea.
+
+</p>
+
+<p>
+
+[Interesting side note: even when jobs are delivered out of order,
+from single destination's point of view the jobs are still delivered
+in the expected order (that is, FIFO unless there was some preemption
+involved). This is because whenever a destination queue becomes
+unblocked (the destination limit allows selection of more recipient
+entries for that destination), all jobs which have peers for that
+destination are unblocked at once.]
+
+</p>
+
+<li> <p>
+
+The idea of the preemption stack at the head of the job list is
+gone. That is, it must be possible to preempt any job on the job
+list. For example, if the jobs 7, 4, 1 and 2 in the example above
+become all blocked, job 3 becomes the current job. And of course
+we do not want the preemption be affected by the fact that there
+are some blocked jobs or not. Therefore, if it turns out that job
+3 might be preempted by job 6, the implementation shall make it
+possible.
+
+</p>
+
+<li> <p>
+
+The idea of the linear preemption stack itself is gone. It's no
+longer true that one job is always preempted by only one job at one
+time (that is directly preempted, not counting the recursively
+nested jobs). For example, in the example above, job 1 is directly
+preempted by only job 4, and job 4 by job 7. Now assume job 7 becomes
+blocked, and job 4 is being delivered. If it accumulates enough
+delivery slots, it is natural that it might be preempted for example
+by job 8. Now job 4 is preempted by both job 7 AND job 8 at the
+same time.
+
+</p>
</ol>
-<p> The points 2 and 3 are those which made the implementation
-(look) complicated and were the real coding work, but I believe
-that to understand the scheduling algorithm itself (which was the
-real thinking work) is fairly easy. </p>
+<p>
+
+Now combine the points 2) and 3) with point 1) again and you realize
+that the relations on the once linear job list became pretty
+complicated. If we extend the point 3) example: jobs 7 and 8 preempt
+job 4, now job 8 becomes blocked too, then job 4 completes. Tricky,
+huh?
+
+</p>
+
+<p>
+
+If I illustrate the relations after the above mentioned examples
+(but those in point 1)), the situation would look like this:
+
+</p>
+
+<blockquote>
+<pre>
+ v- parent
+
+adoptive parent -> 1--2--3--5--... <- "stack" level 0
+ | |
+parent gone -> ? 6 <- "stack" level 1
+ / \
+children -> 7 8 ^- child <- "stack" level 2
+
+ ^- siblings
+</pre>
+</blockquote>
+
+<p>
+
+Now how does <tt>nqmgr</tt> deal with all these complicated relations?
+
+</p>
+
+<p>
+
+Well, it maintains them all as described, but fortunately, all these
+relations are necessary only for purposes of proper counting of
+available delivery slots. For purposes of ordering the jobs for
+entry selection, the original rule still applies: "the job preempting
+the current job is moved in front of the current job on the job
+list". So for entry selection purposes, the job relations remain
+as simple as this:
+
+</p>
+
+<blockquote>
+<pre>
+7--8--1--2--6--3--5--.. <- scheduler's job list order
+</pre>
+</blockquote>
+
+<p>
+
+The job list order and the preemption parent/child/siblings relations
+are maintained separately. And because the selection works only
+with the job list, you can happily forget about those complicated
+relations unless you want to study the <tt>nqmgr</tt> sources. In
+that case the text above might provide some helpful introduction
+to the problem domain. Otherwise I suggest you just forget about
+all this and stick with the user's point of view: the blocker jobs
+are simply ignored.
+
+</p>
+
+<p>
+
+[By now, you should have a feeling that there is more things going
+under the hood than you ever wanted to know. You decide that
+forgetting about this chapter is the best you can do for the sake
+of your mind's health and you basically stick with the idea how the
+scheduler works in ideal conditions, when there are no blockers,
+which is good enough.]
+
+</p>
+
+<h3> <a name="<tt>nqmgr</tt>_memory"> Dealing with memory resource
+limits </a> </h3>
+
+<p>
+
+When discussing the <tt>nqmgr</tt> scheduler, we have so far assumed
+that all recipients of all messages in the active queue are completely
+read into the memory. This is simply not true. There is an upper
+bound on the amount of memory the <tt>nqmgr</tt> may use, and
+therefore it must impose some limits on the information it may store
+in the memory at any given time.
+
+</p>
+
+<p>
+
+First of all, not all messages may be read in-core at once. At any
+time, only qmgr_message_active_limit messages may be read in-core
+at maximum. When read into memory, the messages are picked from the
+incoming and deferred message queues and moved to the active queue
+(incoming having priority), so if there is more than
+qmgr_message_active_limit messages queued in the active queue, the
+rest will have to wait until (some of) the messages in the active
+queue are completely delivered (or deferred).
+
+</p>
+
+<p>
+
+Even with the limited amount of in-core messages, there is another
+limit which must be imposed in order to avoid memory exhaustion.
+Each message may contain huge amount of recipients (tens or hundreds
+of thousands are not uncommon), so if <tt>nqmgr</tt> read all
+recipients of all messages in the active queue, it may easily run
+out of memory. Therefore there must be some upper bound on the
+amount of message recipients which are read into the memory at the
+same time.
+
+</p>
+
+<p>
+
+Before discussing how exactly <tt>nqmgr</tt> implements the recipient
+limits, let's see how the sole existence of the limits themselves
+affects the <tt>nqmgr</tt> and its scheduler.
+
+</p>
+
+<p>
+
+The message limit is straightforward - it just limits the size of
+the
+lookahead the <tt>nqmgr</tt>'s scheduler has when choosing which
+message can preempt the current one. Messages not in the active
+queue simply are not considered at all.
+
+</p>
+
+<p>
+
+The recipient limit complicates more things. First of all, the
+message reading code must support reading the recipients in batches,
+which among other things means accessing the queue file several
+times and continuing where the last recipient batch ended. This is
+invoked by the scheduler whenever the current job runs out of in-core
+recipients and more are required. It is also done any time when all
+in-core recipients of the message are dealt with (which may also
+mean they were deferred) but there are still more in the queue file.
+
+</p>
+
+<p>
+
+The second complication is that with some recipients left unread
+in the queue file, the scheduler can't operate with exact counts
+of recipient entries. With unread recipients, it is not clear how
+many recipient entries there will be, as they are subject to
+per-destination grouping. It is not even clear to what transports
+(and thus jobs) the recipients will be assigned. And with messages
+coming from the deferred queue, it is not even clear how many unread
+recipients are still to be delivered. This all means that the
+scheduler must use only estimates of how many recipients entries
+there will be. Fortunately, it is possible to estimate the minimum
+and maximum correctly, so the scheduler can always err on the safe
+side. Obviously, the better the estimates, the better results, so
+it is best when we are able to read all recipients in-core and turn
+the estimates into exact counts, or at least try to read as many
+as possible to make the estimates as accurate as possible.
+
+</p>
+
+<p>
+
+The third complication is that it is no longer true that the scheduler
+is done with a job once all of its in-core recipients are delivered.
+It is possible that the job will be revived later, when another
+batch of recipients is read in core. It is also possible that some
+jobs will be created for the first time long after the first batch
+of recipients was read in core. The <tt>nqmgr</tt> code must be
+ready to handle all such situations.
+
+</p>
+
+<p>
+
+And finally, the fourth complication is that the <tt>nqmgr</tt>
+code must somehow impose the recipient limit itself. Now how does
+it achieve it?
+
+</p>
+
+<p>
+
+Perhaps the easiest solution would be to say that each message may
+have at maximum X recipients stored in-core, but such solution would
+be poor for several reasons. With reasonable qmgr_message_active_limit
+values, the X would have to be quite low to maintain reasonable
+memory footprint. And with low X lots of things would not work well.
+The <tt>nqmgr</tt> would have problems to use the
+<i>transport</i>_destination_recipient_limit efficiently. The
+scheduler's preemption would be suboptimal as the recipient count
+estimates would be inaccurate. The message queue file would have
+to be accessed many times to read in more recipients again and
+again.
+
+</p>
+
+<p>
+
+Therefore it seems reasonable to have a solution which does not use
+a limit imposed on per-message basis, but which maintains a pool
+of available recipient slots, which can be shared among all messages
+in the most efficient manner. And as we do not want separate
+transports to compete for resources whenever possible, it seems
+appropriate to maintain such recipient pool for each transport
+separately. This is the general idea, now how does it work in
+practice?
+
+</p>
+
+<p>
+
+First we have to solve little chicken-and-egg problem. If we want
+to use the per-transport recipient pools, we first need to know to
+what transport(s) is the message assigned. But we will find that
+out only after we read in the recipients first. So it is obvious
+that we first have to read in some recipients, use them to find out
+to what transports is the message to be assigned, and only after
+that we can use the per-transport recipient pools.
+
+</p>
+
+<p>
+
+Now how many recipients shall we read for the first time? This is
+what qmgr_message_recipient_minimum and qmgr_message_recipient_limit
+values control. The qmgr_message_recipient_minimum value specifies
+how many recipients of each message we will read for the first time,
+no matter what. It is necessary to read at least one recipients
+before we can assign the message to a transport and create the first
+job. However, reading only qmgr_message_recipient_minimum recipients
+even if there are only few messages with few messages in-core would
+be wasteful. Therefore if there is less than qmgr_message_recipient_limit
+recipients in-core so far, the first batch of recipients may be
+larger than qmgr_message_recipient_minimum - as large as is required
+to reach the qmgr_message_recipient_limit limit.
+
+</p>
+
+<p>
+
+Once the first batch of recipients was read in core and the message
+jobs were created, the size of the subsequent recipient batches (if
+any - of course it's best when all recipients are read in one batch)
+is based solely on the position of the message jobs on their
+corresponding transport's job lists. Each transport has a pool of
+<i>transport</i>_recipient_limit recipient slots which it can
+distribute among its jobs (how this is done is described later).
+The subsequent recipient batch may be as large as the sum of all
+recipient slots of all jobs of the message permits (plus the
+qmgr_message_recipient_minimum amount which always applies).
+
+</p>
+
+<p>
+
+For example, if a message has three jobs, first with 1 recipient
+still in-core and 4 recipient slots, second with 5 recipient in-core
+and 5 recipient slots, and third with 2 recipients in-core and 0
+recipient slots, it has 1+5+2=7 recipients in-core and 4+5+0=9 jobs'
+recipients slots in total. This means that we could immediately
+read 2+qmgr_message_recipient_minimum more recipients of that message
+in core.
+
+</p>
+
+<p>
+
+The above example illustrates several things which might be worth
+mentioning explicitly: first, note that although the per-transport
+slots are assigned to particular jobs, we can't guarantee that once
+the next batch of recipients is read in core, that the corresponding
+amounts of recipients will be assigned to those jobs. The jobs lend
+its slots to the message as a whole, so it is possible that some
+jobs end up sponsoring other jobs of their message. For example,
+if in the example above the 2 newly read recipients were assigned
+to the second job, the first job sponsored the second job with 2
+slots. The second notable thing is the third job, which has more
+recipients in-core than it has slots. Apart from sponsoring by other
+job we just saw it can be result of the first recipient batch, which
+is sponsored from global recipient pool of qmgr_message_recipient_limit
+recipients. It can be also sponsored from the message recipient
+pool of qmgr_message_recipient_minimum recipients.
+
+</p>
+
+<p>
+
+Now how does each transport distribute the recipient slots among
+its jobs? The strategy is quite simple. As most scheduler activity
+happens on the head of the job list, it is our intention to make
+sure that the scheduler has the best estimates of the recipient
+counts for those jobs. As we mentioned above, this means that we
+want to try to make sure that the messages of those jobs have all
+recipients read in-core. Therefore the transport distributes the
+slots "along" the job list from start to end. In this case the job
+list sorted by message enqueue time is used, because it doesn't
+change over time as the scheduler's job list does.
+
+</p>
+
+<p>
+
+More specifically, each time a job is created and appended to the
+job list, it gets all unused recipient slots from its transport's
+pool. It keeps them until all recipients of its message are read.
+When this happens, all unused recipient slots are transferred to
+the next job (which is now in fact now first such job) on the job
+list which still has some recipients unread, or eventually back to
+the transport pool if there is no such job. Such transfer then also
+happens whenever a recipient entry of that job is delivered.
+
+</p>
+
+<p>
+
+There is also a scenario when a job is not appended to the end of
+the job list (for example it was created as a result of second or
+later recipient batch). Then it works exactly as above, except that
+if it was put in front of the first unread job (that is, the job
+of a message which still has some unread recipients in queue file),
+that job is first forced to return all of its unused recipient slots
+to the transport pool.
+
+</p>
+
+<p>
+
+The algorithm just described leads to the following state: The first
+unread job on the job list always gets all the remaining recipient
+slots of that transport (if there are any). The jobs queued before
+this job are completely read (that is, all recipients of their
+message were already read in core) and have at maximum as many slots
+as they still have recipients in-core (the maximum is there because
+of the sponsoring mentioned before) and the jobs after this job get
+nothing from the transport recipient pool (unless they got something
+before and then the first unread job was created and enqueued in
+front of them later - in such case the also get at maximum as many
+slots as they have recipients in-core).
+
+</p>
+
+<p>
+
+Things work fine in such state for most of the time, because the
+current job is either completely read in-core or has as much recipient
+slots as there are, but there is one situation which we still have
+to take care specially. Imagine if the current job is preempted
+by some unread job from the job list and there are no more recipient
+slots available, so this new current job could read only batches
+of qmgr_message_recipient_minimum recipients at a time. This would
+really degrade performance. For this reason, each transport has
+extra pool of <i>transport</i>_extra_recipient_limit recipient
+slots, dedicated exactly for this situation. Each time an unread
+job preempts the current job, it gets half of the remaining recipient
+slots from the normal pool and this extra pool.
+
+</p>
+
+<p>
+
+And that's it. It sure does sound pretty complicated, but fortunately
+most people don't really have to care how exactly it works as long
+as it works. Perhaps the only important things to know for most
+people ire the following upper bound formulas:
+
+</p>
+
+<p>
+
+Each transport has at maximum
+
+</p>
+
+<blockquote>
+<pre>
+max(
+qmgr_message_recipient_minimum * qmgr_message_active_limit
++ *_recipient_limit + *_extra_recipient_limit,
+qmgr_message_recipient_limit
+)
+</pre>
+</blockquote>
+
+<p>
+
+recipients in core.
+
+</p>
+
+<p>
+
+The total amount of recipients in core is
+
+</p>
+
+<blockquote>
+<pre>
+max(
+qmgr_message_recipient_minimum * qmgr_message_active_limit
++ sum( *_recipient_limit + *_extra_recipient_limit ),
+qmgr_message_recipient_limit
+)
+</pre>
+</blockquote>
+
+<p>
+
+where the sum is over all used transports.
+
+</p>
+
+<p>
+
+And this terribly complicated chapter concludes the documentation
+of <tt>nqmgr</tt> scheduler.
+
+</p>
+
+<p>
+
+[By now you should theoretically know the <tt>nqmgr</tt> scheduler
+inside out. In practice, you still hope that you will never have
+to really understand the last or last two chapters completely, and
+fortunately most people really won't. Understanding how the scheduler
+works in ideal conditions is more than good enough for vast majority
+of users.]
+
+</p>
<h2> <a name="credits"> Credits </a> </h2>
<li> These simplifications, and their modular implementation, helped
to develop further insights into the different roles that positive
-and negative concurrency feedback play, and helped to avoid all the
-known worst-case scenarios.
+and negative concurrency feedback play, and helped to identify some
+worst-case scenarios.
</ul>
/etc/postfix/main.cf:
smtp_tls_CAfile = /etc/postfix/cacert.pem
smtp_tls_session_cache_database =
- btree:/var/spool/postfix/smtp_tls_session_cache
+ btree:/var/lib/postfix/smtp_tls_session_cache
smtp_use_tls = yes
smtpd_tls_CAfile = /etc/postfix/cacert.pem
smtpd_tls_cert_file = /etc/postfix/FOO-cert.pem
smtpd_tls_key_file = /etc/postfix/FOO-key.pem
smtpd_tls_received_header = yes
smtpd_tls_session_cache_database =
- btree:/var/spool/postfix/smtpd_tls_session_cache
+ btree:/var/lib/postfix/smtpd_tls_session_cache
tls_random_source = dev:/dev/urandom
# Postfix 2.3 and later
smtpd_tls_security_level = may
# * The postconf2man tool leaves unrecognized HTML in place as a
# reminder that it is not supported.
#
+# * Text between <!-- and --> is stripped out. The <!-- and -->
+# must appear on separate lines.
+#
# Also:
#
# * All <dt> and <dd>text must be closed with </dt> and </dd>.
<i>number</i> equal to "1", a destination's delivery concurrency
is decremented by 1 after each failed pseudo-cohort. </dd>
+<!--
+
<dt> <b><i>number</i> / sqrt_concurrency </b> </dt>
<dd> Variable feedback of "<i>number</i> / sqrt(delivery concurrency)".
The <i>number</i> must be in the range 0..1 inclusive. This setting
may be removed in a future version. </dd>
+-->
+
</dl>
<p> A pseudo-cohort is the number of deliveries equal to a destination's
<i>number</i> equal to "1", a destination's delivery concurrency
is incremented by 1 after each successful pseudo-cohort. </dd>
+<!--
+
<dt> <b><i>number</i> / sqrt_concurrency </b> </dt>
<dd> Variable feedback of "<i>number</i> / sqrt(delivery concurrency)".
The <i>number</i> must be in the range 0..1 inclusive. This setting
may be removed in a future version. </dd>
+-->
+
</dl>
<p> A pseudo-cohort is the number of deliveries equal to a destination's
* Patches change both the patchlevel and the release date. Snapshots have no
* patchlevel; they change the release date only.
*/
-#define MAIL_RELEASE_DATE "20071207"
+#define MAIL_RELEASE_DATE "20071208"
#define MAIL_VERSION_NUMBER "2.5"
#ifdef SNAPSHOT
.c.o:; $(CC) $(CFLAGS) -c $*.c
$(PROG): $(OBJS) $(LIBS)
- $(CC) $(CFLAGS) -o $@ $(OBJS) $(LIBS) $(SYSLIBS) -lm
+ $(CC) $(CFLAGS) -o $@ $(OBJS) $(LIBS) $(SYSLIBS)
$(OBJS): ../../conf/makedefs.out
#define QMGR_FEEDBACK_IDX_NONE 0 /* no window dependence */
#define QMGR_FEEDBACK_IDX_WIN 1 /* 1/window dependence */
+#if 0
#define QMGR_FEEDBACK_IDX_SQRT_WIN 2 /* 1/sqrt(window) dependence */
+#endif
#ifdef QMGR_FEEDBACK_IDX_SQRT_WIN
#include <math.h>
* We can't simply force delivery on this queue: the transport's pending
* count may already be maxed out, and there may be other constraints
* that definitely should be none of our business. The best we can do is
- * to play by the same rules as everyone else: trigger *some* delivery
- * via qmgr_active_drain() and let round-robin selection work for us.
+ * to play by the same rules as everyone else: let qmgr_active_drain()
+ * and round-robin selection take care of message selection.
*/
queue->window = 1;
- if (queue->todo_refcount > 0)
- qmgr_active_drain();
/*
* Every event handler that leaves a queue in the "ready" state should
.c.o:; $(CC) $(CFLAGS) -c $*.c
$(PROG): $(OBJS) $(LIBS)
- $(CC) $(CFLAGS) -o $@ $(OBJS) $(LIBS) $(SYSLIBS) -lm
+ $(CC) $(CFLAGS) -o $@ $(OBJS) $(LIBS) $(SYSLIBS)
$(OBJS): ../../conf/makedefs.out
#define QMGR_FEEDBACK_IDX_NONE 0 /* no window dependence */
#define QMGR_FEEDBACK_IDX_WIN 1 /* 1/window dependence */
+#if 0
#define QMGR_FEEDBACK_IDX_SQRT_WIN 2 /* 1/sqrt(window) dependence */
+#endif
#ifdef QMGR_FEEDBACK_IDX_SQRT_WIN
#include <math.h>
* We can't simply force delivery on this queue: the transport's pending
* count may already be maxed out, and there may be other constraints
* that definitely should be none of our business. The best we can do is
- * to play by the same rules as everyone else: trigger *some* delivery
- * via qmgr_active_drain() and let round-robin selection work for us.
+ * to play by the same rules as everyone else: let qmgr_active_drain()
+ * and round-robin selection take care of message selection.
*/
queue->window = 1;
- if (queue->todo_refcount > 0)
- qmgr_active_drain();
/*
* Every event handler that leaves a queue in the "ready" state should