IEN 187 ISSUES IN INTERNETTING PART 2: ACCESSING THE INTERNET Eric C. Rosen Bolt Beranek and Newman Inc. June 1981 IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen ISSUES IN INTERNETTING PART 2: ACCESSING THE INTERNET 2. Accessing the Internet This is the second in a series of papers, the first of which was IEN 184, that examine some of the issues in designing an internet. Familiarity with IEN 184 is presupposed. This particular paper will deal with the issues involved in the design of internet access protocols and software. The issue of addressing, however, is left until the next paper in this series. Part of our technique for exposing and organizing the issues will be to criticize (sometimes rather severely) the current protocols and procedures of the Catenet, even though we do not, at the present time, offer specific alternatives in all cases. In IEN 184, section 1.4, we outlined four steps in the operation of a Network Structure. Let's now look closely at the first step, viz., how the source Host actually submits a message to the source Switch. In general, a Host will need to run three separate protocols to do this: -a protocol to utilize the electrical interface between the Host and the initial component of the Pathway it uses to access the source Switch. - 1 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen -a protocol to govern communication between the Host and the Pathway (PATHWAY ACCESS PROTOCOL). -a protocol to govern communication between the Host and the source Switch (NETWORK ACCESS PROTOCOL). We can make this point more concrete by giving some examples. Consider the case of an ARPANET host which wants to access the Catenet via the BBN gateway (which is also a Host on the ARPANET). Then the ARPANET is the Pathway the host uses to access the source Switch (the gateway). If the host is a local or distant host, the electrical interface to the Pathway is the 1822 hardware interface. If it is a VDH host, the electrical interface is whatever protocol governs the use of the pins on the modem connectors. If it were an X.25 host, the interface might be X.21. The PATHWAY ACCESS PROTOCOL is the 1822 protocol, which governs communication between the host and the first IMP on the Pathway. The NETWORK ACCESS PROTOCOL in this case would be the DoD standard Internet Protocol (IP), which governs communication between the host and the source Switch (gateway). If, on the other hand, we consider the case of an ARPANET host which is communicating with another host on the ARPANET, and whose data stays purely within the ARPANET, 1822 becomes both the NETWORK ACCESS PROTOCOL (since the source Switch is now identical to the source IMP), and the PATHWAY ACCESS PROTOCOL, since the Pathway is now the 1822 hardware connection. - 2 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen We will have nothing further to say about the electrical interface, since that is really just a straightforward hardware matter. (However, such characteristics of the electrical interface as error rate, for example, might have to be reflected in the design of the Pathway Access Protocol.) The design of both the Pathway Access Protocol and the Network Access Protocol do raise a large number of interesting issues, and that shall be the focus of this paper. We believe it to be very unlikely that Host software (or gateway software) can utilize the internet efficiently unless it takes the idiosyncrasies of BOTH the Pathway Access Protocol and the Network Access Protocol into account. A gateway or host software implementer who spends a great deal of time carefully building his IP module, but who then writes a "quick and dirty" 1822 module, is likely to find that his inefficient use of 1822 completely sabotages the advantages which his carefully designed IP is supposed to have. Experience with the ARPANET has shown many times that poorly constructed host software can create unnecessary performance problems. It seems, for example, that many 1822 modules completely ignore the flow control restrictions of the ARPANET, thereby significantly reducing the throughput that they can obtain over the ARPANET. We have even encountered many hosts which cannot properly handle some of the control messages of the 1822 protocol, which also leads to a very inefficient use of the ARPANET. - 3 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen It is not difficult to understand why a host (or gateway) software implementer might overlook the issues having to do with the proper use of the Pathway Access Protocol. There are a number of pressures that, if not dealt with properly at a management level, lead naturally to the neglect of Pathway Access Protocol issues. An internet implementer might want to concentrate on the "new stuff", viz., the Network Access Protocol, IP, and may not be at all interested in the idiosyncrasies of the older Pathway Access Protocol (1822). He might be misled, by the belief that the packet-switching networks underlying the internet should be transparent to it, into believing that those packet-switching networks can be treated as simply as if they were circuits. He might also be under pressure to implement as quickly as possible the necessary functionality to allow internet access. While this sort of pressure is very common, the pressure to make the internet PERFORM well (as opposed to the pressure simply to make it work at all) is generally not felt until much (sometimes years) later. The tendency to neglect performance considerations while giving too much attention to simply obtaining the needed functionality in the quickest way is also reinforced by such "modern" design procedures as top-down design, and specification of protocols in formal languages. While these procedures do have a large number of advantages, they also serve to obscure performance issues. If the researchers and designers of protocols, following modern - 4 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen design methodologies, do not give adequate consideration to performance at the time of protocol design, one can hardly expect the implementers to do so. Yet ARPANET experience has shown again and again that decisions made at the level of implementation, apparently too picayune to catch the attention of the designers, can be important determinants of performance. Still another reason why protocol software implementers might tend to disregard the niceties of the Pathway Access Protocol is the lack of any adequate protocol software certification procedure. An ARPANET host could be connected to an IMP for months, transferring large amounts of traffic, without ever receiving certain 1822 control messages. Then some sort of change in network conditions could suddenly cause it to receive that control message once per hour. There really is no way at present that the implementer could have possibly run tests to ensure that his software would continue to perform well under the new circumstances. This problem is somewhat orthogonal to our main interests, but deserves notice. One of the most important reasons why protocol software implementers tend to ignore the details of the Pathway Access Protocols is the "philosophical" belief that anyone working on internet software really "ought not" to have to worry about the details of the underlying networks. We will not attempt to refute this view, any more than we would attempt to refute the view of a person who claimed that it "ought not" to rain on his - 5 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen day off. We emphasized in IEN 184 that the characteristics of a Network Structure's Pathways are the main thing that distinguish one Network Structure from another, and that the problems of internetting really are just the problems of how to build a Network Structure with Pathways as ill-behaved as packet-switching networks. Thus building a successful internet would seem to be a matter of dealing specifically with the behavior of the various Pathways, rather than ignoring that behavior. We assume that that our task is to create an internet which is robust and which performs well, as opposed to one which "ought to" perform well but does not. It is true, as we have said, that within the Network Structure of the Catenet, we want to regard the ARPANET as a Pathway whose internal structure we do not have to deal with, but that does NOT mean that we should regard it as a circuit. Any internet Host or Switch (gateway), TO PERFORM WELL, will have to have a carefully designed and tuned Pathway Access Protocol module geared to the characteristics of the Pathway that it accesses. The relationship between the Pathway Access Protocol and the Network Access Protocol does offer a number of interesting problems. For one thing, it appears that these protocols do not fit easily into the OSI Open Systems model. If we are accessing a single packet-switching network, the Network Access Protocol appears to be a level 3 protocol in the OSI model, and the Pathway Access Protocol appears to be a level 2 protocol. - 6 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen However, if we are accessing an internet, we still need the level 3 Network Access Protocol, but now the Pathway Access Protocol also has a level 3 component, in addition to its level 2 component. So the Host is now running two different level 3 protocols, although the Network Access Protocol appears functionally to be in a layer "above" the level 3 part of the Pathway Access Protocol. Perhaps the main problem here is that the OSI model has insufficient generality to capture the structure of the protocols needed to access an internet like the Catenet. It is interesting to see how some of these considerations generalize to the case of a Host which needs to access an internet (call it "B") through a Pathway which is itself an internet (call it "A"). Then the Host needs a Network Access Protocol for the internet B, a Network Access Protocol for the internet A (which is also its Pathway Access Protocol for internet B), and a Network Access Protocol for the actual network to which it is directly connected, which is also its Pathway Access Protocol for internet A. As we create more and more complicated Network Structures, with internets piled on top of internets, the Hosts will have a greater and greater protocol burden placed upon them. Ultimately, we might want to finesse this problem by removing most of this burden from the Hosts and putting it in the Switches, and giving the Switches knowledge of the hierarchical nature of the (internet) Network Structure. For - 7 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen example, a Host on the ARPANET might just want to give its data to some IMP to which it is directly connected, without worrying at all about whether that data will need to leave the ARPANET and travel via an internet. The IMP could decide whether that is necessary, and if so, execute the appropriate protocol to get the data to some internet Switch at the next highest level of hierarchy. If the data cannot reach its destination within the internet at that level, but rather has to go up further in the hierarchy to another internet, the Switch at the lower level could make that decision and execute the appropriate protocol. With a protocol structure like this, we could have an arbitrarily nested internet, and the Switches at a particular level, as well as the Hosts (which are at the lowest level), would only have to know how to access the levels of hierarchy which are immediately above and/or below them. This procedure would also make the host software conform more to the OSI model, since only one Network Access Protocol would be required. However, this sort of protocol structure, convenient as it might be for the Hosts, does not eliminate any of the issues about how to most efficiently use the Pathways of a Network Structure. Rather, it just pushes those issues up one level, and makes the Switches correspondingly more complicated. A proper understanding of the issues, therefore, is independent of what sort of protocol structuring we design. - 8 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen Having emphasized the need for hosts and gateways to take account of the details of particular Pathway Access Protocols, we must point out that this is not always a simple thing to do. If the Network Structure underlying a Pathway is just a single network, like the ARPANET, this problem is not so terribly difficult, since one can expect that there will be available a lot of experience and information about what a host should do to access that network efficiently. If, on the other hand, the Pathway is really an internet itself, the problem is more difficult, since it is much more difficult to say anything substantive about its characteristics. This is a point we must keep in mind as we discuss specific issues in access protocol design. In the remainder of this paper, we will attempt to deal with a number of issues involved in the design of robust, high-performance Network and Pathway Access Protocols. We will not attempt to cover every possible issue here. In particular, the issue of how to do addressing is important enough to warrant a paper of its own, and shall be put off until the next paper in this series. We will attempt throughout to focus on issues which particularly affect the reliability of the internet configuration (as perceived by the users), and on issues which affect the performance of the internet (as perceived by the users). Wherever possible, we will try to exhibit the way in which the reliability and performance of a protocol trade off against its - 9 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen functionality. If protocol designers concentrate too heavily on questions of what functionality is desired, as opposed to what functionality can be provided at a reasonable level of performance and reliability, they are likely to find out too late that the protocol gives neither reasonable performance nor reliability. 2.1 Pathway Up/Down Considerations In general, a Host will be multi-homed to some number of Switches. In fact, it is easy to imagine a Host which is both (a) multi-homed to a number of IMPs, within the Network Structure of the ARPANET (this cannot be done at present, but is planned for the future), and also (b) multi-homed to a number of gateways (namely, all the gateways on the ARPANET) within the Network Structure of the Catenet. Whenever a Host is multi-homed to a number of Switches in some Network Structure, it has a decision to make, namely, which of those Switches to use as the source Switch for some particular data traffic. In order to make this choice, the very first step a Host will have to take is to determine which Switches it can reach through operational Pathways. One thing we can say for sure is that if a Host cannot reach a particular Switch through any of its possible Pathways, then it ought not to pick that Switch as the source Switch to which to send its data. In a case, for example, where the ARPANET is partitioned, a Host on the ARPANET which needs to send - 10 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen internet traffic will want to know which gateways it can reach through which of its ARPANET interfaces. To make this determination possible, there must be some sort of "Pathway Up/Down Protocol", by which the Host determines which of its potential Pathways to gateways are up and which are down. This is not to say, of course, that the Hosts have to know which gateways are up and which are down, but rather, they must know which gateways they can and cannot reach. Of course, this situation is quite symmetric. The Switches of a Network Structure (and in particular, the gateways of an internet) must be able to determine whether or not they can reach some particular host at some particular time. Otherwise, the gateway might send traffic for a certain Host over a network access line through which there is no path to that Host, thereby causing unnecessary data loss. Apparently, this problem has occurred with some frequency in the Catenet; it seems worthwhile to give it some systematic consideration. The design of reliable Pathway Up/down protocols seems like something that "ought to be" trivial, but in fact can be quite difficult. Let's begin by considering the case of an ARPANET host which simply wants to determine whether it can reach some IMP to which it is directly connected. The first step for the host to take (if it is a local or distant host) is to look at the status of its Ready Line. If the Ready Line to some IMP is not up, then it is certain that communication with that IMP is not - 11 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen possible. If the host is a VDH host, then there is a special up/down protocol that the host must participate in with the IMP, and if that fails, the host knows that it cannot communicate with the IMP. Of course, these situations are symmetric, in that the IMP has the same need to know whether it can communicate with a host, and must follow the same procedures to determine whether this is the case. However, even in these very simple cases, problems are possible. For example, someone may decide to interface a host to an IMP via a "clever" front-end which hides the status of the Ready Line from the host software. If a host is multi-homed, and has to choose one from among several possible source IMPs, but cannot "see" the Ready Lines, what would stop it from sending messages to a dead IMP? Eventually, of course, a user would notice that his data is not getting through, and would probably call up the ARPANET Network Control Center to complain about the unreliability of the network, which, from his perspective, is mysteriously dropping packets. From the opposite perspective, one must realize that such a front-end might also hide the status of the host from the IMP, so that the network has no way of knowing whether a particular host is currently capable of communicating with the network. This is especially likely to happen if the "clever" front-end takes packets from the network which are destined for a particular host, and then just drops them if the host is down, with no feedback to either IMP or host. If a host is multi-homed, but one of its access lines is down, - 12 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen this sort of configuration might make it quite difficult for the network to reach a reasonable decision as to which access line to use when sending data to that host. The lesson, of course, is that the status of the Ready Line should never be hidden from the host software, but it is hard to communicate this lesson to the designers of host software. Again, the issue is one of performance vs. functionality. A scheme which hides the status of the Ready Line from IMP or host may still have the required (minimum) functionality, but it will just perform poorly under certain conditions. This may seem like a made-up problem which probably would never occur, but in fact it has occurred. We once had a series of complaints from a user who claimed that at certain times of certain days he had been unable to transmit data successfully over the ARPANET. Upon investigation, we found that during those times, the user's local IMP had been powered down, due apparently to a series of local power failures at the user's site. Of course, the IMP will not transmit data when it is powered down. But it was somewhat mysterious why we had to inform someone of a power failure at his own site; surely the host software could have detected that the IMP was down simply by checking the Ready Line, and so informed the users. When this user investigated his own host software (a very old NCP), he found that it would inform the users that the IMP was down ONLY if the IMP sent the host a message saying that it was going down. Since the IMP does not - 13 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen send a message saying that it is about to lose power, the host software, which apparently did not check the Ready Line as a matter of course, did not detect the outage. It looked to the user, therefore, as though the network had some mysterious and unreliable way of dropping packets on the floor. It seems that many hosts presently exist whose networking software is based on the assumption that the IMPs never go down without warning. Hosts do sometimes have difficulty determining whether their Pathway to an IMP is up or down, even when it seems like this should be totally trivial to determine. Reliable network service requires, however, that host software and hardware designers do not hide the status of the IMP from the host, or the status of the host from the IMP. This will become increasingly important as more and more hosts become multi-homed. Of course, this is only a first step in a proper up/down determination. It is not impossible for a Ready Line to be up but for some problem either in IMP or host to prevent communications from taking place. So some higher level up/down protocol is also necessary. Some protocol should be defined by which Host and Switch can send traffic to each other, and require the other to respond within a certain time period. A series of failures to respond would indicate that proper communications is not possible, at least for the time being. It is important to note, though, that the need for a higher level up/down protocol does not obviate the need for the lower level procedure of - 14 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen monitoring the Ready Line. If the higher level procedure fails, but the Ready Line appears to be up, knowledge of both facts is needed for proper fault isolation and maintenance. Also important to notice is that if the lower level procedure indicates that the Pathway is down, the higher level procedure should not be run. This might not seem important at first glance, but in practice, it often turns out that attempting to send traffic to a non-responsive machine results in significant waste of resources that could be used for something more useful. In the more general case, where a Host's Pathway to a source Switch may include one or more packet-switching networks, it is far from trivial to determine whether the Switch can be reached from the Host via the Pathway. Consider, for example, how a given ARPANET host could determine whether a given Catenet gateway on the ARPANET can be accessed via some given ARPANET source IMP. Of course, the first step is to determine whether communication with that source IMP is possible. Even if it is, however, the gateway might still be unreachable, since it may be down, or the network may be partitioned. ("Officially", every ARPANET Host is supposed to be reachable from any other ARPANET Host. However, the average connectivity of the ARPANET is only 2.5, which means that only a small number of line or node failures are needed to induce partitions. Furthermore, a few ARPANET sites are actually stubs, which means that a single failure can isolate them from the rest of the ARPANET. As often - 15 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen seems to happen in practice, the sites that are stubs seem to be attached by the least reliable lines, so that partitions are not infrequent. At any rate, there will probably be networks in the internet that partition more frequently than the ARPANET does. Internet protocols must detect and react to network partitions, instead of simply disregarding them as "too unlikely to worry about." ) In the special case where the Pathway between some Host and some Switch is the ARPANET, the ARPANET itself can provide information to the Host telling it whether the Switch is reachable. If the Switch is not reachable, and a Host attempts to send an ordinary data packet to it, the ARPANET will inform the Host whether or not that packet was delivered, and if not, why not. Unfortunately, the current ARPANET does not provide this information in response to datagrams. However, we have already seen the need to provide such information in the case of logically addressed datagrams (see IEN 183), and plan to implement a scheme for doing so. An ARPANET Host which is also an internet Host can implement a low level Pathway up/down protocol simply by paying attention to the 1822 replies that it receives from the ARPANET. There are hosts which seem to disregard these 1822 control messages, and which seem to continue to send messages for unreachable hosts into the ARPANET. Of course, this is a senseless waste of resources which can severely degrade performance. Indeed, it may look to an end-user, or even - 16 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen a gateway implementer, as though the ARPANET is throwing away packets for no reason, when the real problem is that the host software cannot respond adequately to exceptional conditions reported to it by the network. We have spoken of the need for Host and Switch to run a higher level up/down protocol, to take account of the conditions when one of them seems reachable to the network, but still will not perform adequately when another entity attempts to communicate with it. Switch and Host must run some protocol together which enables each to validate the proper performance of the other. The Catenet Monitoring and Control System (CMCC), currently running on ISIE, runs a protocol of this sort with the gateways. The CMCC sends a special datagram every minute to each gateway, and expects to receive an acknowledgment (or echo) for this special datagram back from the gateway. After three consecutive minutes of not receiving the echo, the CMCC infers that the gateway cannot be reached. After receiving a single echo, the CMCC infers that the gateway can be reached. (Gateways run a similar protocol with their "neighboring gateways".) A Pathway up/down protocol which does not rely on the intervening network to furnish the information would certainly have to involve some such exchange of packets between the Host and the Switch, but it would have to be rather more complex than this one. One of the problems with this protocol is that it is incapable of detecting outages of less than three minutes. This - 17 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen may be suitable for the CMCC's purposes, but is not generally suitable for a Host which wants to know which source Switch to send its traffic to. We would not want some Host to spend three full minutes sending data to a Switch which cannot be reached; the effect of that could be many thousands of bits of data down the drain. (Of course, higher level protocols like TCP would probably recover the lost data eventually through the use of Host-Host retransmissions, but that involves both a severe drain on the resources of the Host, which ought to be avoided whenever possible, and a severe degradation in delay and throughput.) Another problem with this particular protocol is that it uses datagrams, which are inherently unreliable, and as a result, the inference drawn by the CMCC is unreliable. From the fact that three datagrams fail to get through, it is quite a big jump to infer that no traffic at all can get through. Another problem is the periodicity of the test packets. If they get in phase with something else which may be going on in the network, spurious results may be produced. The design of a Pathway up/down protocol must also be sensitive to the fact that some component network of a Pathway may be passing only certain types of packets and not others. For example, at times of heavy usage, certain networks may only be able to handle packets of high priority, and lower priority packets may either be refused by that net (at its access point), or, even worse, discarded internally by the net with no feedback. - 18 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen The Pathway up/down protocol must be sensitive to this, and will have to indicate that the Pathway is only "up" to certain classes of traffic. If a Pathway is really a Network Structure which will inform its Hosts when it cannot accept certain traffic types, then this information can be fed back into the up/down protocol. (Note however that this might be very difficult to do if the Pathway consists of not a single network, but of an internet). Alternatively, a Host may have to rely on its higher level Pathway up/down protocol to determine, for several classes of traffic, whether the Pathway is up to members of that class. Apart from the inherent difficulty of doing this, it may be difficult to map the traffic classes that a given component network distinguishes into traffic classes that are meaningful to a Host, or even to the Switches of the internet. Yet we wouldn't want traffic to be sent into a network which is not accepting that particular kind of traffic, especially if there are alternative Pathways which would be willing to accept that traffic. Many of these considerations suggest that the higher level up/down protocols could turn out to be rather intricate and expensive. Remember that a gateway may have many many hosts "homed" to it, and must be able to determine, for each and every one of these hosts, whether communication with it is possible. Yet it probably is not feasible to suppose that each gateway can be continuously running an up/down protocol with each potential - 19 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen host, and still have time left to handle its ordinary traffic. This suggests that the primary up/down determination be made from the low-level protocol, i.e., that the Switches should rely on the networks underlying the Pathways to inform them whether a given Host is up or down, and the Hosts should similarly rely on the networks underlying the Pathways to pass them status information about the gateways. It would be best if the higher level up/down protocol only needed to be run intermittently, as a check on the reliability of the lower level protocol. Unfortunately, the use of low level up/down protocols is not always possible. Many networks, unlike the ARPANET, do not even gather any information about the status of their hosts, and hence cannot inform a source Host that it is attempting to send data to a destination Host which is not reachable. (SATNET is an example of a network that does not pass "destination dead" information.) In the case where a particular Host-Switch Pathway is itself an internet, the problem is even worse. Unless the component networks of that internet can be made to cooperate in obtaining RELIABLE up/down information and passing it back to the source Host, it will be very hard for a Host to make any reasonable determination as to whether a particular Switch is reachable. We would strongly recommend the incorporation of low level up/down protocols in ALL component networks of the internet. There is another important problem in having a Host determine which of its potential source Switches on the internet - 20 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen are up and which are down. In order to run a protocol with the Switch, or even to query the lower level network about the Switch, the Host must have some way of identifying the Switch. It is not so difficult for a Host on the ARPANET to identify the IMPs that it is directly connected to, since it is quite simple to devise a protocol by which a Host can send a message down each of its access lines, asking who is on the other end. It is rather more difficult for a Host to find out which gateways it is homed to (i.e., which gateways are on a common network with it). There is no easy way for an ARPANET Host to find out which other ARPANET hosts are Catenet gateways. There is no "direct connection" at which to direct protocol messages. In the current Catenet, hosts have to know in advance how to identify the Catenet gateways on their networks (although there are certain restricted circumstances under which a host can obtain the name of another gateway from a gateway about which it already knows). Yet it does not seem like a good idea to require a Host to know, a priori, which other Hosts on its network are also internet Switches. This makes it difficult to enable Hosts to take advantage of newly installed gateways, without making changes by hand to tables in the Hosts (a procedure which could require weeks to take effect). There is a rather attractive solution to this problem. If each component network in the internet can determine for itself which of its Hosts are also internet Switches (gateways), then the Switches of that network can - 21 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen provide that information to the Hosts. This would require the existence of a protocol which the gateways run with the Switches of the individual component networks, by means of which the gateways declare themselves to be gateways. Each individual network would also have to have some internal protocol for disseminating this information to other Hosts, and for keeping this information up to date. If the network allows GROUP ADDRESSING, further advantages are possible. The network could maintain a group address (called, say, "Catenet Gateways") which varies dynamically as gateways enter and leave the network. Hosts could find out which gateways are reachable over particular network access lines by sending some sort of protocol message to the group address, and waiting to see who replies. Hosts would then not have to have any a priori knowledge of the gateways on their home networks. One very important though often neglected aspect of up/down protocols is the way in which the up/down protocol interacts with the ability to perform adequate maintenance of the Network Structure. It is tempting to think that a Pathway up/down protocol ought to declare a Pathway "down" only if it is totally dead or otherwise totally unusable. But in fact, a pathway should be declared down before it becomes totally dead, if its packet "non-delivery rate" exceeds a certain threshold. (We use the term "non-delivery rate" where the term "error rate" is more commonly used. We are trying to emphasize that it is important - 22 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen to detect not only errors, in the sense of checksum errors, but rather any circumstances, including but not limited to checksum errors, which prevent the proper delivery of packets.) There are two reasons for this: 1) The existence of a non-zero non-delivery rate on a Pathway implies that some packets placed on that Pathway will not make it through to the other end. In most applications, these packets will have to be retransmitted at some higher level of protocol, or else by the end user himself (packetized speech is one of the few exceptions to this). As the number of retransmissions increases, the delay also increases, and the throughput decreases. So when the non-delivery rate reaches a certain point, the Pathway should be removed from service, in order to improve delay and throughput. Of course, this assumes that an alternate Pathway is available with a lower non-delivery rate. Also, other things being equal, removing bandwidth from a Network Structure will also tend to increase delay and reduce throughput, so we really want the up/down protocol to pick out the proper cross-over point. 2) It is often better to fix a Pathway at the first sign of trouble than to wait for it to fail totally. One implication of this is that the up/down protocol should - 23 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen perform equally well whether or not the Pathway is heavily loaded with traffic. We would not want to use a protocol which made its determination solely by making measurements of user traffic, since that protocol would not function well during periods when user traffic is very light. That is, a faulty Pathway with no user traffic would not be detected. Yet if repair work has to be done on a Pathway, we would most like to find out about it during lightly loaded hours, so that a fix can be effected with minimal disruption, possibly before the heavily loaded hours begin. Another important characteristic for a Pathway up/down protocol to have is the ability to determine the nature of the Pathway "outage". This is quite important for fault isolation, but is easy for a host software person to overlook, since he may not be aware of such issues. If a Host cannot get its packets to a Switch over a certain Pathway, it will want to regard that Pathway as down, and will want to use an alternate Pathway. From the Host perspective, it doesn't care whether the reason it can't use the Pathway is because of a network partition, or because of network congestion, or because of some other reason. However, if the Host personnel want to be able to call up the Pathway personnel and request that the problem be fixed, it's not enough to say, "Your network isn't working; call me back when it's fixed." The more information the Pathway up/down protocol can - 24 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen gather, the quicker a fix can be effected. In the case where the Pathway is the ARPANET, quite a bit of information can be gathered from proper instrumentation of the 1822 module, and proper attention by the host software to the 1822 replies; this will be discussed further in section 2.6. The design of the ARPANET's line up/down protocol might be a good model for the design of a general Pathway up/down protocol. The design of the ARPANET protocol was based upon a mathematical analysis of the probabilistic error characteristics of telephone circuits, and the protocol is intended to bring a line down when and only when its error rate exceeds a threshold. However, the error characteristics of Pathways in general (i.e., of packet-switching networks) are not well understood at all, and there is no similar mathematical analysis that we can appeal to. At present, we can offer no ready answer to the question of how a Host can tell which of several possible source Switches is reachable, if the Switches are accessed via a network (or sequence of networks) which will not even inform the Host whether or not its traffic even gets delivered. This is an important question which will require further thought, and considerable experimentation. - 25 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen 2.2 Choosing a Source Switch Once a Host has determined which source Switches it can reach over which of its interfaces, it still has to determine which one to use for sending some particular packet (unless the Host is "lucky" enough to find out that only one source Switch is reachable). Making the proper choice can be quite important, since the performance which the Host gets may vary greatly depending upon which source Switch it selects. That is, some source Switch might be much closer to the destination, in terms of delay, than another. It then might be quite important to choose the proper one. To make things a bit more concrete, consider the case of a Host which is multi-homed (via several distinct 1822 links) to several ARPANET IMPs, and whose traffic can be handled entirely within the ARPANET. There are several things a host might want to take into account in choosing the best source IMP to use for a particular packet, including: 1) The loading on the 1822 access line to each possible source IMP. 2) The distance between each source IMP and the destination Host, for some notion of "distance." The first of these two quantities is relatively easy to obtain, since all the Host need do is monitor its own 1822 lines; it should be possible to devise a monitoring scheme which - 26 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen indicates which of the 1822 lines is providing the best service to its IMP, perhaps simply by measuring the queuing delay experienced in the Host by messages queued for that line. (Any such measurement would have to take into account some of the niceties of the 1822 protocol, though.) Obtaining information about the second quantity is more difficult. The Host might try to keep some measurement of round-trip delay (delay until a RFNM is received) between itself and each destination Host. However, in order to do this, some traffic for each destination Host would have to be sent over each access line, so that the delay could be measured. This means that some traffic has to be sent over a long delay path, simply in order to determine that that is a long delay path. A simpler scheme might be for the Host to get delay information from the IMP. A Host could ask each potential source IMP what its delay to the destination Host is. By using this information, plus the information it gathers locally about the loading of its access lines, the Host could determine which source IMP provides the shortest path to the destination. This would require that we define a protocol by which a Host can ask the IMPs to which it is homed to provide their delays to a destination Host. The Host could make these requests periodically, and then change its selection of source IMPs as required in order to react to changes in delay. There are a few subtle protocol issues to be considered here, though. We would have to make sure that a Host cannot beat a Switch to death by - 27 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen constantly asking it what its delays are; probably we would have to give the Switch the option of not replying to these requests if it is too busy with other things (like ordinary data traffic). A bigger problem lies in the assumption that the Switches will even have this data to provide. The routing algorithm used by the ARPANET IMPs does, in fact, provide each IMP with a value of delay, in milliseconds, to each other IMP in the network. There is no reason why this information could not be fed back to the hosts on request. Note, however, that while a source IMP knows its delay to each possible destination IMP, it does not know its delay to each potential destination HOST over each possible access line to that Host, since the routing algorithm does not maintain measurements of delay from an IMP to a locally attached host. Yet this latter delay might be quite significant. Still, the information that the ARPANET IMPs could provide to the Hosts should enable them to make a better choice than they could make without this information. Another problem with this idea of having the Switches feed back delay information to the Hosts is the proper choice of units. If a Host is going to take the delay information provided by the network and then add some locally measured delay information to it, it is important for the Host to know what units the network is using to measure delay. Yet we also have to ensure that the network developers and maintainers are free to change the way in which the network does measurements, and the - 28 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen units in which the measurements are taken, WITHOUT NEEDING TO COORDINATE SUCH CHANGES WITH ALL HOST ADMINISTRATIONS. That is, we don't want further development of the network, and further refinements in the way network measurements are done, to be overly constrained by the fact that the Hosts demand measurements in a certain unit. We also want to ensure that host software implementations are not invalidated by a decision to change the units that the network uses for its internal measurements. So the protocol would have to enable the Switch to tell the Host what units it is providing; the Host would then make any necessary conversions. (Alternatively, the Host could tell the Switch what units it wants, and the Switch could do the conversion before sending the information to the Host.) In the internet environment, the situation is more complicated. An ARPANET Host which is also an internet Host would have to (a) figure out its delay to each of its source IMPs, (b) query each source IMP for its delay to each source gateway, and (c) query each source gateway about its delay to each destination. There is no straightforward way to gather the rest of the needed delay information, however, namely the delay from the destination gateway to the destination Host. In more complex Network Structures, with internets nested on top of internets, this problem becomes increasingly more complex. It seems that the only really reliable way, and the most straightforward way, for the source Host to gather information - 29 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen about the delays via various source Switches to a destination Host, is for it to do the measurements itself. This is the recommended solution. Delay information should also be made available from the component networks for Hosts which cannot do this, but it should be understood that those hosts cannot expect to get as good a quality of service as the hosts which go to more trouble to do their own measurements. 2.3 Type of Service One very important piece of information that a Host must specify to the source Switch through the Network Access Protocol is the "type of service" desired. To quote from the DoD standard Internet Protocol (IP) specification [1, p. 15], "The Type of Service is used to indicate the quality of the service desired; this may be thought of as selecting among Interactive, Bulk, or Real Time, for example." This seems to make sense, since one does have the feeling that different types of applications will fall into different categories, and information about the categories may help the Switches of the Network Structure through which the data is moving decide how best to treat it. However, choosing just the right set of categories of service is quite a complex matter. For example, both a terminal user of a time-sharing system, and a user of a query-response system (like an automated teller) fall under the rubric of "interactive", but that doesn't mean that the service requirements are the same. - 30 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen Both Remote-Job-Entry and File Transfer fall under the rubric of "bulk", but it is not obvious that they have the same requirements. Both real-time process control and packetized voice fall into the category of "Real Time", but the requirements of these two applications seem to be very different. A very real issue, which has not yet been given adequate consideration, is the question of just how many categories of application type there really should be, and just what the implications of putting a packet into one of these categories ought to be. As we go on, we will see a number of problems that arise from failure to carefully consider this issue. It is rather difficult to find examples of Network Access Protocols which have really useful class-of-service selection mechanisms. The 1822 protocol allows the user to select from among two priorities; it allows the choice of single-packet or multi-packet messages; it allows the choice between "raw packets" and "controlled packets." It is up to some user (or more realistically, up to some host software implementer who may have only a vague and limited understanding of the applications which his software will serve, and of the network that he is accessing) to map his application characteristics onto these three choices. Unfortunately, it is doubtful that there is anyone outside of the ARPANET group at BBN with any clear understanding of the implications of making the various choices. The task of making the optimum choice for some application is further complicated by - 31 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen the fact that the effects of making the various choices can be very dependent on the network load. For example, it is often possible to get more throughput from single-packet messages than from multi-packet messages. This will happen if the destination IMP has several different source Hosts sending multi-packet messages to it, but is short on buffer space (as many of the ARPANET IMPs are), and if the multi-packet messages contain only two or three packets per message. Not only is this sort of thing very difficult for an arbitrary user to understand (to a naive network user, it must seem ridiculous), it is also subject to change without notice. Although users can vary their service significantly by sending optimum size messages, the principles governing the "optimum" size are very obscure, and we cannot really expect users to map their application requirements onto this network feature in any reasonable manner. A similar problem arises with respect to the priority bit that the 1822 protocol allows. Basically, a priority packet will get queued ahead of any non-priority packets on the queues for the inter-IMP links and on the queues for the IMP-Host access lines. However, priority packets receive no special preference when competing with non-priority packets for CPU cycles or for buffer space. Also, there is no notion at all in the ARPANET of refusing to accept low priority packets because the network is already too heavily loaded with high priority packets. Although someone who has carefully studied the ARPANET might be able to - 32 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen say what the effect of setting the priority bit is under some particular set of circumstances, some user who is wondering whether his application requirements are best served by setting the priority bit really has no way of answering that question. The actual effect of the priority bit does not fully correspond to any intuitive notion of priority that an arbitrary user is likely to have. Another problem: although it is presently allowed, it is not really a good idea to let the users choose whether to set the priority bit or not. Fortunately, most hosts do not submit packets with the priority bit on. It wouldn't be terribly surprising, though, if some host software implementer decided that he would always set the priority bit, in order to get faster service. Of course, overuse of the priority bit just means that it will have no effect at all, and that seems to mean that its use must be controlled in some way, and not simply left up to each user, as in the 1822 protocol. The IP offers even worse problems than 1822 in these respects. Like 1822, the IP does not really allow the user to classify his traffic according to application type. Rather, it forces him to pick one of 5 possible precedence values (from highest to lowest precedence, whatever that means, exactly), to pick one of 4 reliability values (from most to least reliable), to indicate whether he wants his data to be stream data or datagram data in component networks for which this distinction is meaningful, to indicate whether he wants high or low speed, and - 33 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen to indicate whether speed is more important to him than reliability is. The idea here, apparently, is that any user can map his application requirements into certain abstract properties, and the information which the IP passes from the Host to the source Switch is supposed to indicate which of these abstract properties the user needs. At each internet hop, these abstract properties are supposed to be mapped to particular properties that are meaningful to the network in question. The Pathway Access Protocol for that network would then be used to indicate to the Switches of that component network what particular properties the data transfer should have within that network. In fact, the only apparent use of the "type of service" information in the internet Network Access Protocol (IP) is to carry information to be passed to the individual Pathway Access Protocols. This all sounds reasonable enough when considered in the abstract, but it gives rise to a large number of vexing problems when we attempt to consider particular ways in which this "type of service" information is to be used. Empirically, it seems that few current gateway implementations take any notice of this information at all. We suggest that the problem is not that the individual implementers have not had time to write the code to take account of this information, but rather that it is far from clear how this information should be handled, or even that this information is really meaningful. We suggest further that an - 34 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen internet user would also have a great deal of difficulty deciding how to specify the "type of service" information in order to get a specific quality of service needed by his application. Suppose a user needs the maximum possible speed for his application, so he uses IP to indicate that he values speed above all else. What would the current Catenet do? For concreteness, suppose there is a choice of sending this user's data either via a sequence of 4 low-delay terrestrial networks, or through three satellite networks, each of which contains two satellite hops. The current implementation of the Catenet would send the data through the three satellite networks. However, since the user indicated that he values speed above all else, he will get the fastest service that each of the satellite networks can provide! Of course, this may not be what the user will have expected when he asked for speed, since the fastest service through a satellite network is not fast. A user may well wonder what the point of specifying speed is, if his data is going to traverse some sequence of satellite networks, even if a much faster path is available. Furthermore, it is not correct to assume, in general, that a user who values speed will really want the speediest service through every network. If traffic must go through a satellite network, it may be important to try to get one-hop rather than two-hop delay, if this is possible. Yet it may not be economical to also try to get the speediest service through all terrestrial networks; the difference between high and low - 35 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen speed service through a terrestrial network might be "in the noise", even when compared to the shortest delay through the satellite network. It is not impossible, or even unlikely, that better overall service (or more cost-effective service) can be achieved by using the fastest possible service through some networks, but less than the fastest through others. There are two immediate lessons here. First, the characteristics that a user specifies in the Network Access Protocol may require some interaction with routing, since the characteristics he desires simply cannot be provided, in general, by sending his traffic through a random series of networks, and then mapping information he specifies in the Network Access Protocol into information carried in the individual Pathway Access Protocols. Second, what a user means intuitively by "speed" just may not map into what some particular component net means by "speed". Once again, we see that the basic problem stems from the differing characteristics of the Pathways in the Network Structure. Another peculiar feature of the IP is the mysterious "S/R bit", which a user is supposed to set to indicate whether he prefers speed over reliability, or vice versa, should these conflict. One unsuitable aspect of this is the apparent assumption that it even makes sense to prefer either speed or reliability over the other, without specifying more detail. It is easy to imagine that some user is willing to accept reliability of less than 100% if he can increase his speed - 36 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen somewhat. It is also easy to imagine that a user would be willing to accept somewhat slower service if it gives him higher reliability. But there will always be a range that the user wants to stay within. If his reliability must be moved below a certain threshold in order to get more speed, he may not want this, even if he would be willing to say that he prefers speed to reliability. Similarly, if his delay must go above a certain threshold to gain more reliability, he may not want this, even if, when talking in general terms, he says that he needs reliability more than speed. It really doesn't make any sense at all to try to map a particular application type into "speed over reliability" or "reliability over speed", unless ranges and thresholds are also specified. What this means in practice is that a user will not be able to make a reasonable choice of how to set this bit in the IP header; whatever he sets it to is bound to produce results other than those he expects under some not too uncommon set of circumstances. We do not want to leave unquestioned the tacit assumption that speed and reliability are opposing virtues, so that increasing one must be expected to decrease the other. To quote again from the IP spec, "typically networks invoke more complex (and delay producing) mechanisms as the need for reliability increases" [1, p 23]. This reasoning is somewhat superficial. It may be true that in some networks, the less reliable kinds of service are speedier, but this is not invariably the case. To - 37 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen see this, consider the following (fictitious) network. This network allows the user to request either "reliable" or "unreliable" data transfer. Reliable packets are controlled by a set of protocols, both at the end-end and hop-hop level, which ensure delivery. Unreliable packets are not under the control of any such protocols. Furthermore, reliable packets go ahead of unreliable ones on all queues, in particular, the CPU queue. In addition, unreliable packets can be flushed from the net at any time, if some resource they are using (such as buffer space) is needed for a reliable packet. These latter two measures are needed to ensure that the net does not become so heavily loaded with unreliable packets that there is no room for the reliable ones. (It would not make much sense to advertise a "reliable" service, and then to allow the unreliable packets to dominate the network by using most of the network resources. If unreliable packets could grab most of the resources, leaving the "reliable" ones to scavenge for the left-over resources, then it would be virtually inevitable that the service received by the "unreliable" packets would appear, to the users, to be more reliable than the service received by the "reliable" packets. To achieve a true dichotomy between reliable and unreliable service, the reliable packets must be given priority in all respects over the unreliable ones. We should also remember, by the way, that although many protocols combine features of reliability, sequentiality, error control, and flow control, these are not the - 38 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen same, and there is no reason why a network might not offer a reliable but unsequenced service). This sort of network design seems quite reasonable, perhaps more reasonable than the design of any existing network. It would allow for a (presumably inexpensive) class of service ("unreliable") which would be able to use only those network resources not needed by the more reliable (and expensive) class of packets, and which would not suffer any additional delay due to the presence of the protocols which would be needed to ensure reliability. In such a network, unreliable packets might well experience less delay than reliable ones, WHEN THE NETWORK IS LIGHTLY LOADED; WHEN IT IS HEAVILY LOADED, HOWEVER, RELIABLE PACKETS WOULD TEND TO EXPERIENCE THE SMALLER DELAY. If this is the case, it is hard to see how a user could be expected to make a reasonable choice of IP service parameters at all. He may know what his needs are, but we can hardly expect him to know how to map his needs onto particular aspects of the behavior of a particular network component of an internet, especially when the behavior determined by that mapping will vary dynamically with the network loading, and hence with the time of day. Two other peculiarities of the "type of service" feature of the IP are worth mentioning. First, there seems to be no notion of the relation between speed and priority, though in many networks, the priority of a message is the major determinant of its speed. (There are, to be sure, networks which attempt to - 39 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen treat priority solely as "acceptance class", differentiating it completely from considerations of speed. However, we know of no network implementation which has been shown to differentiate SUCCESSFULLY between these two concepts, and there is reason to doubt that this differentiation is even possible in principle.) Second, one of the choices to be made is whether to prefer stream or datagram service. This is a clear example of something that is not based on "abstract parameters of quality of service", but rather on a particular feature of one or two particular networks. Requesting stream service will NOT do what a user might expect it to do, namely set up a stream or virtual circuit through the entire internet. This would require a lengthy connection set-up procedure, involving reservations of resources in the gateways, which resources are to be used only for specific connections. If we are really serious about providing stream service, this is just as important as obtaining stream service within the component networks serving as the Pathways of the internet. Indeed, it is hard to imagine any real use for an internet "stream service" which treats packets as datagrams during most of their lifetime in the internet, and then treats them as stream packets in one or two component networks. It must be remembered that the sort of stream service provided by a network like SATNET is only useful to a user if his data appears at the SATNET interface at fixed periods, synchronized with the scheduling of the stream slots on the satellite channel. If the data must - 40 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen first travel through several datagram networks before reaching SATNET, IT IS VIRTUALLY IMPOSSIBLE THAT THE DATA WILL ARRIVE AT SATNET WITH THE PROPER PERIODICITY to allow it to make proper use of the SATNET stream. Now there are certain specific cases where it might be possible to provide some sort of stream service, say if some data is going from a local network through SATNET to another local network and thence directly to the destination Host. (Though even in this case, some sort of connection set-up and reservation of resources in the gateways between SATNET and the local networks would probably be necessary.) Note, however, that if a user requests this type of service, he is also constraining the types of routes his data can travel. If SATNET is not available, he might not want to use the internet at all at that time. Or he might be willing to tolerate a less optimal route ("half a loaf is better than none"), but might not want "stream service" if the less optimal route has to be used. In no case can a type of service like "stream" be obtained simply through the mapping of "type of service" in the internet onto "type of service" in the component networks. We do not want to have a Network Access Protocol that will need to be infinitely expandable, so that the user can indicate the type of service he wants in each particular network that his data may eventually travel through. For one thing, as the internet becomes larger, so that there are more paths between each possible source and destination, the users will not - 41 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen generally know what set of networks their data will travel through. Since the number of component networks in the internet may be continually increasing, and since we cannot anticipate in advance the features that each new network may offer, it does not really seem reasonable to have to keep adding fields to the IP, to account for particular characteristics of each new component network. Yet this seems inevitable with the current approach. That is, we do not agree with the claim in the IP spec that the type of service field in the IP indicates "abstract parameters". Rather, we think the type of service field has been constructed with certain particular networks in mind, just those networks which are currently in the Catenet, and that the various service fields have no meaning whatsoever apart from the particular "suggested" mappings to protocol features of specific networks given in the spec. (And since these mappings are only "suggested", not required, one might wonder whether the type of service field really has any consistent meaning at all). This situation is perhaps tolerable in a research environment, where most of the users of the internet are explicitly concerned with issues of networking, and willing to try a large number of experiments to see what sort of service they get. One must remember, however, that in a truly operational environment, the average user will not be concerned at all about networking, will not know anything about networking, will not care about networking, and will only want the network to appear transparent - 42 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen to him. In order for such a user to make successful use of the type of service field in a Network Access Protocol, the parameters of the field must be meaningful to him. If they are only meaningful to network experts, the user will never be able to figure out how best to set these parameters. Rather than providing a type of service specification which is nothing but a sort of "linear combination" of the types of service provided by the component networks, the internet ought to offer a small, specific number of service types which are meaningful at the application level. The possible values of internet service type might be "interactive session," "transaction," "file transfer", "packetized speech," and perhaps a few others. The categories should be simple enough so that the user can figure out which category his particular application falls into without needing to know the details of the operation of the internet. The Switches of the internet should take responsibility for sending the data on a route which is capable of providing the requested type of service, and for sending the data through component networks of the internet in a way which maximizes the possibility that the type of service requested will actually be achieved. Of course, in order to do this, we must first answer a couple of hard questions, such as "Exactly what characteristics of service do users want and expect for particular applications?", and "What features must the internet Switches have, and what features must the component networks - 43 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen have, in order to provide service with the necessary characteristics?" In order to give adequate communications service in an operational environment, however, these questions must be given careful consideration by internet designers. To some extent, these questions are difficult research issues, and answering them will require doing some systematic experimentation and instrumentation in the internet. The problem is hard, but unavoidable. The IP's current approach seems aimed at side-stepping these issues, since it places the burden entirely on the user. It tends to give users the illusion that, by properly specifying the bit fields in the IP header, they can tune the internet to provide them with the specific type of service they find most desirable. This is, however, only an illusion. The perspective taken by the current IP seems to be not, "How should the internet be designed so as to provide the needed characteristics of service while providing a simple interface to the user?", but rather, "Taking the current design of the internet as a given, how can we give the user the ability to massage, bend, and twist it so as to get service characteristics which might be close to what he wants?" The former perspective seems much more appropriate than the latter. Although we are not at present prepared to offer an alternative to IP, there are several lessons we would like to draw from this discussion: - 44 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen 1) While an internet Network Access Protocol really does need to contain some field which indicates the desired type of service in a manner which is abstract enough to be mapped to particular protocol features of particular networks, the proper specification of a sufficiently abstract set of parameters is an open and difficult research issue, but one which needs to be studied if an operational internet configuration is ever to give really adequate service to a relatively naive end-user. 2) Providing the requested type of service may require cooperation from all the Switches (perhaps through the routing algorithm), and involves more than just mapping fields from the internet Network Access Protocol to the particular access protocols used by the component networks. If the type of service requested by the user is to be consistently meaningful, then his request must be given UNIFORM treatment by the internet Switches. Different gateways must not be allowed to treat the request differently. 2.4 Special Features The DoD Standard Internet Protocol contains a number of features which, while not strictly necessary in order for a user to get his data delivered, and distinct from the type of service field, do affect to some extent the service a user gets from the - 45 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen internet. Some of the features are worthy of comment, and that is the purpose of this section. 2.4.1 Time to Live The presence of the "time-to-live" field in the Catenet IP seems like a clear example of something that has no place in an access protocol. The IP specification [1] has some contradictory things to say about time-to-live. The user is supposed to set this field to the number of seconds after which he no longer cares to have his information delivered, or something like that. It's far from clear how some user is supposed to make a decision as to what value to set this to. For one thing, although this value is supposed to be represented in units of one second [1, p. 24], there does not appear to be any requirement for the gateways to figure out how many seconds to decrement this value by. The spec actually says that each gateway should decrement this field by at least one, even if it has no idea how much time has actually elapsed [1, p. 40]. Well, a user might ask, is this field represented in seconds or isn't it? What is the point of saying in the spec that it is in seconds, if it is not necessarily in seconds; this will only result in confusion. That is, any attempt by a user to set this field to a reasonable value is likely to have unanticipated consequences. Any attempt to make inferences about internet behavior from the effect that various settings of the time-to-live field will necessarily be unreliable. - 46 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen At any rate, unless the Switches all keep a synchronized clock, there is no real way for them to determine how long a packet has been in the network (or internet), as opposed to how much time it has spent in the Switches, and this difference may be significant if a packet is sent over several long-haul networks with long-delay lines but fast Switches. It's hard to see the point of requiring a user to specify, in the Network Access Protocol, a value which cannot be assigned any consistent meaning. (It's not clear what value this information has anyway; according to the IP spec, "the intention is to cause undeliverable datagrams to be discarded" [1, p. 24]. But a reasonable routing algorithm should cause undeliverable datagrams to be discarded anyway, no matter what value is specified for time-to-live). It seems plain in any case that over the years, Host personnel will begin to tend to set this field to its maximum value anyway. In most implementations, the setting of this field will not be left to the end-user, but will be in the code which implements the IP. Several years from now, no one will remember the importance of setting this field correctly. Eventually, someone will discover that the data he sends to a certain place does not get through, and after months of intensive investigation, it will turn out that his IP is setting too small a value in the time-to-live field, and his packets are dying just before they reach their destination. This will make people tend - 47 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen to use the maximum value as a default, reducing the utility of the information to almost nil. (No one will want to spend the time re-tuning this value to the optimum as the internet configuration expands, causing real packet delays to become longer and longer. In fact, at many Host sites there may not be anyone who can figure out enough of the Host code to be able to re-tune this value.) Time-to-live, while useful for debugging purposes (perhaps), has no real place in an operational system, and hence is not properly part of a Network Access Protocol. If the Switches of a Network Structure want to perform packet life timing functions, in a way which is under the control of a single network administration, and easily modified to reflect changing realities, that is one thing. It is quite a different thing to build this into a Host-level protocol, with a contradictory spec, where it will certainly fall into disuse, or misuse. Protocol features which are only useful (at best) for network experimenters and investigators are bound to cause trouble when invoked at the Host level, as part of a protocol which every Host must implement, and whose implementers may not fully understand the implications of what they are doing. Some of these difficulties have, as their basic cause, the old implicit model of the internet that we discussed in IEN 185. The IP conflates protocol features that properly belong to the - 48 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen Network Access Protocol with features that properly belong to the protocol used internally among the Switches. This sort of conflation, and consequent violation of protocol layering, are inevitable if the gateways are seen as hosts which patch networks together, rather than as Switches in an autonomous Network Structure. 2.4.2 Source Routing The current IP has a feature known as "source routing," which allows each user to specify the sequence of networks that his internet packet is to travel. We mention this primarily as an example of something that a Network Access Protocol in a truly operational environment ought not to have. An acceptable internet routing algorithm ought to distribute the traffic in order to achieve some general goal on an internet-wide basis, such as minimizing delay, maximizing throughput, etc. Any such routing algorithm is subverted if each user is allowed to specify his own route. Much of the routing algorithm's ability to prevent or avoid congestion is also compromised if certain packets are allowed to follow a route pre-determined by some user, even if the routing algorithm determines that best service (either for those packets themselves, or for other packets in the internet) would be obtained if those packets followed a different route. - 49 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen To a certain extent, the presence of the source routing option in the IP is probably a result of the rather poor routing strategy in the present Catenet, and a way of attempting to obtain better service than the routing algorithm can actually provide. The long-term solution to this problem would be to improve the routing algorithm, rather than to subvert it with something that is basically a kludge. We would claim that the existence of any application or service that seems to require the use of source routing is really an indication of some lack or failure in the design of the internet, and a proper long-term solution is to improve the situation by making basic architectural changes in the internet, rather than by grafting on new kludges. Source routing also has its use as an experimental device, allowing tests to be performed which might indicate whether it is really worthwhile to add some new feature or service to the internet. (Although the way in which source routing subverts the basic internet routing algorithm can have disturbing side-effects on the experimental results, which must be properly controlled for.) However, we doubt that any truly useful experiments requiring source routing can be performed by individual users in isolation. Rather, useful experiments would seem to require the cooperation and coordination of the participating users as well as those who are responsible for controlling and maintaining the internet. So it is not clear that there is any true utility to - 50 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen having a source routing option at the level of the Network Access Protocol, thereby giving each and every user the option of using it. In an operational environment, this feature should either be eliminated, or controlled through the use of authorizations, which would cause gateways to discard source-routed packets which lack proper authorization. 2.4.3 Fragmentation and Reassembly One of the few problems which is really specific to an internet whose pathways consist of packet-switching networks is the fact that it is difficult to specify to the user a maximum packet size to use when giving traffic to the internet. If a user's traffic is to go through EVERY component packet-switching network, then the maximum packet size he can use is that of the component network with the smallest maximum packet size. Yet it seems unwise to require that no user ever exceed the maximum packet size of the component network with the smallest maximum packet size. To do so might lead to very inefficient use of other component networks which permit larger packet sizes. If a particular user's traffic does not happen to traverse the component network with the smallest maximum packet size, the restriction really does no good, and only leads to inefficiency. Since, in a large internet, most traffic will probably traverse only a small subset of the component networks, this is quite important. In addition, some Hosts with limited resources might - 51 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen have a high overhead on a per-packet basis, making it quite important to allow them to put larger packets into the internet. This gives rise to the question of, what should an internet Switch do if it must route a packet over a certain Pathway, but that packet is larger than the maximum size of packets that can be carried over that Pathway? The solution that has been adopted in the current Catenet is to allow the internet Switches to "fragment" the packets into several pieces whenever this is necessary in order to send the packet over a Pathway with a small maximum packet size. Each fragment of the original packet is now treated as an independent datagram, to be delivered to the destination Host. It is the responsibility of the destination Host to reassemble the original packet from all the fragments before passing it up to the next highest protocol layer. (If the destination happens to have a high per-packet overhead, too bad.) The IP has several features whose only purpose is to enable this reassembly. These features are extremely general, so that fragments can be further fragmented, ad infinitum, and correct reassembly will still be possible. However, it seems that this feature has not had very much operational testing in the Catenet; gateway implementers seem to be as reluctant to actually implement fragmentation as Host implementers are to implement reassembly. If at least one gateway does do fragmentation, then if some Host does not do reassembly, it cannot, in general, talk - 52 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen to any other Host on the internet. If a source Host knows that a destination Host does not do reassembly, then it can, through IP, indicate to the gateways that they ought not to fragment. However, in that case, any datagrams that are not fragmentable but which must be transmitted over a Pathway with a smaller maximum packet size are simply lost in transit. It should be noted that the procedure of doing reassembly in the destination Host violates the precepts of protocol layering in a basic way. The internet is not transparent to protocol modules in the Hosts, since a datagram put into the internet by a protocol module in the source Host might appear at the destination Host in quite a different form, viz., as a set of fragments. One might try to avoid this conclusion by claiming that what we have been calling "the Host software modules" are really part of a Switch, rather than part of a Host, so that no transparency is violated. One could also claim that a dog has five legs, by agreeing to call its tail a leg. But this would no more make a tail a leg than calling a Host software module "part of the network" makes it so. One of the main advantages of properly layered protocols is the ability it provides to change the network without having to change the Hosts. This is needed if changes to the network are even to be possible, since any change that requires Host software to change is, for all practical purposes, impossible. This suggests that the boundary of the network be drawn at the boundary where changes are - 53 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen possible without coordination among an unlimited number of Host administrations, and the natural place to draw this boundary is around the Switches. While the Switches of a Network Structure can all be under the control of a common administration, the Hosts cannot. This suggests that any violation of protocol layering that is as gross as the need to have Hosts do reassembly is a problem that is to be avoided whenever possible. The problems of writing Host-level software to do reassembly in a reliable manner do not seem to have been fully appreciated. If a Host's resources (such as buffer space, queuing slots, table areas, etc.) are very highly utilized, all sorts of performance sub-optimalities are possible. Without adequate buffer management (see IEN 182), even lock-ups are possible. One must remember that reassembly is not a simple matter of sending the fragments to the next higher level process in proper sequence. The situation is more complex, since the first fragment of a datagram cannot be sent up to the next higher protocol level until all the fragments of that datagram are received. If buffers are not pre-allocated at the destination Host, then fragments of some datagrams may need to be discarded to ensure that there is room to hold all the fragments of some other datagram; otherwise "reassembly lockup" is possible. If the internet gateways really did a large amount of fragmentation, so that Hosts needed to do a large amount of reassembly, this would almost certainly give rise to a variety of peculiar performance - 54 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen problems and phasing effects which could make the recently discovered "silly window syndrome" look quite benign. Unfortunately, it is hard to gain an appreciation of these sorts of problems until one has personally encountered them, at which point it is often too late to do anything about them. Performance considerations (as opposed simply to considerations of functionality) would seem to indicate that fragmentation and reassembly be avoided whenever possible. Note that performance problems associated with reassembly might crop up suddenly at any time in the life of the internet, as some Host which rarely received fragments in the past suddenly finds itself bombarded with them, possibly due to a new application. Since this sort of effect is notoriously difficult to test out in advance, one would expect potential problems to be lying in wait. Problems like these tend to crop up at a time when the Host administration has no one available who understands and can modify the Host software, which means that such problems can be very intransigent and difficult to remedy. Of course, problems in Host networking software are usually blamed on the network (i.e., on the Switches), which also does not help to speed problem resolution. One way to remove this sort of problem from the Host domain is to have the destination Switches themselves do any necessary reassembly before passing a datagram on to its destination Host. - 55 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen This has the advantage that problems which arise will fall under the domain of the Network administration, which is more likely to be able to deal with them than are the various Host administrations. However, this really does not simplify the situation, or reduce the amount of performance sub-optimalities that we might be faced with; it just takes the same problems and puts them somewhere else. ARPANET IMPs do fragmentation (though only at the source IMP) and reassembly at the destination IMP, and this has turned out to be quite a tricky and problem-strewn mechanism. Other approaches should be investigated. Of course, one possible way around fragmentation is to adopt a policy of not routing any packets over Pathways which cannot handle packets of that size. If there are several possible routes between source and destination, which have similar characteristics except for the fact that one of them has a maximum packet size which is too small, the most efficient means of handling this problem might just be to avoid using the route which would require fragmentation. Even if this means taking a slightly longer route to the destination, the extra delay imposed during internet transit might be more than compensated for by the reduction in delay that would be obtained by not forcing the destination Host to do reassembly. Of course, this scheme requires interaction with routing, but as long as there are a small number of possible maximum packet sizes, this scheme is not difficult to implement (at least, given a reasonable routing algorithm). - 56 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen Unfortunately, it might be the case that there just is no route at all to a particular destination, or else no reasonable route, which does not utilize a Pathway whose maximum packet size is "too small." In this case, there seems no way around fragmentation and reassembly. However, a scheme which is worth considering is that of doing hop-by-hop fragmentation and reassembly within the internet. That is, rather than having reassembly done at the destination (Switch or Host), it is possible to do reassembly at the Switch which is the exit point from a component network which has an unusually small packet size. Datagrams would be fragmented upon entry to such networks, and reassembled upon exit from them, with no burden on either the destination Switch or the destination Host. The fact that fragments would never travel more than one hop without reassembly ameliorates the performance problems somewhat, since the amount of time a partially reassembled datagram might have to be held would be less, in general, than if reassembly were done on an end-end basis. A strategy of doing hop-by-hop reassembly and fragmentation also allows more efficient use of the internet's Pathways in certain cases. One problem with the end-end strategy is the essential "randomness" of its effects. Consider, for example, a large packet which must traverse several networks with large maximum packet sizes, and then one network with a small maximum packet size. The current method of doing fragmentation and - 57 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen reassembly allows the packet to remain large throughout the networks that can handle it, fragmenting it only when it reaches its final hop. This seems efficient enough, but consider the case where the FIRST internet hop is the network with the smallest maximum packet size, and the remaining hops are networks with large maximum packet sizes. The current strategy then causes a very inefficient use of the internet, since the packet must now travel fragmented through ALL the networks, including the ones which would allow the larger packet size. If some of these networks impose constraints on a per-packet basis (which might either be flow control constraints, or monetary constraints based on per-packet billing), this inefficiency can have a considerable cost. Hop-by-hop reassembly, on the other hand, would allow the large packet to be reassembled and to travel through the remaining networks in the most cost-effective manner. Such a strategy is most consonant with our general thesis that an efficient and reliable internet must contain Switches which are specifically tuned to the characteristics of the individual Pathways. It also removes the problem from the Host domain, making the system more consonant with the precepts of protocol layering. There is, unfortunately, one situation in which hop-by-hop fragmentation cannot work. If the Pathway between some destination Host and the destination Switch has a small maximum packet size, so that the destination Switch must fragment - 58 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen datagrams intended for that Host, then reassembly must be done by the Host itself, since there is no Switch at the other end of the Pathway to do the reassembly. This seems to mean that Hosts whose "home networks" have unusually small maximum packet sizes will be forced to implement the ability to perform reassembly, and must tolerate any resultant performance disadvantages. 2.5 Flow Control The topic of "flow control" or "congestion control" (we shall be employing these terms rather interchangeably, ignoring any pedantic distinctions between them) breaks down naturally into a number of sub-topics. In this section we shall be concerned with only one such sub-topic, namely, how should the Switches of the Network Structure enforce flow control restrictions on the Hosts? We shall not consider here the issue of how the Switches should do internal flow control, or what protocols they need to run among themselves to disseminate flow control information, but only the issue of how the results of any internal flow control algorithm should be fed back to the hosts. The IP is a rather unusual Network Access Protocol, in that it does not have any flow or congestion control features at all. This makes it very different from most other Network Access Protocols, such as 1822 or X.25, which do have ways of imposing controls on the rate at which users can put data into the network. The IP, on the other hand, is supposed to be a - 59 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen "datagram protocol", and therefore (?) is not supposed to impose any flow or congestion control restrictions on the rate at which data can be sent into the internet. In this section, we will discuss whether this is appropriate, and whether the "therefore" of the previous sentence is really correctly used. The issue of how flow or congestion control restrictions ought to be passed back to a Host, or more generally, how a Network Structure ought to enforce its congestion control restrictions, is a tricky issue. Particularly tricky is the relation between datagram protocols and flow control. Datagrams are sometimes known (especially with reference to the ARPANET) as "uncontrolled packets," which tends to suggest that no flow control should be applied to them. This way of thinking may be a holdover from the early days of the ARPANET, when it was quite lightly loaded. In those days, the flow control which the ARPANET imposes was much too strict, holding the throughput of particular connections to an unreasonably low value. Higher throughput could often be obtained by ignoring the controls, and just sending as much traffic as necessary for a particular application. Since the network was lightly loaded, ignoring the controls did not cause much congestion. Of course, this strategy breaks down when applied to the more heavily loaded ARPANET of today. Too much uncontrolled traffic can cause severe congestion, which reduces throughput for everybody. Therefore - 60 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen many people now tend to recognize the need to control the uncontrolled packets, if we may be forgiven that apparent contradiction. Clearly, there is some tension here, since it makes little sense to regard the same traffic as both "controlled" and "uncontrolled." If a Network Access Protocol is developed on the assumption that it should be a "datagram protocol", and hence need not apply any controls to the rate at which data can be transferred, it will not be an effective medium for the enforcement of flow control restrictions at the host-network access point. If congestion begins to become a problem, so that people gradually begin to realize the importance of congestion control, they will find that the Network Access Protocol gives them no way to force the Hosts to restrict their traffic when that is necessary. The probable result of this scenario would be to try to develop a scheme to get the congestion control information to the Hosts in a way that bypasses the Network Access Protocol. This is our "logical reconstruction" of the current situation in the Catenet. When gateways think that there is congestion, they send "source quench" packets to the Hosts themselves, and the Hosts are supposed to do something to reduce the congestion. This source quench mechanism should be recognized for what it is, namely a protocol which is run between EVERY host and EVERY Switch (including intermediate Switches, not just source Switches) within a Network Structure, and which completely bypasses the - 61 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen Network Access Protocol (IP). This violates protocol layering in a very basic way, since proper layering seems to imply that a source Host should have to run a protocol with a source Switch only, not with every Switch in the network. Of course, the fact that some mechanism appears to violate the constraints of protocol layering is not necessarily a fatal objection to it. However, given the present state of the art of flow control techniques, which is quite primitive, flow control procedures must be designed in a way that permits them to be easily modified, or even completely changed, as we learn more about flow control. We must be able to make any sort of changes to the internal flow control mechanism of a Network Structure without any need to make changes in Host-level software at the same time. ARPANET experience indicates quite clearly that changes which would be technically salutary, but which require Host software modifications, are virtually impossible to make. Host personnel cannot justify large expenditures of their own to make changes for which they perceive no crucial need of their own, just because network personnel believe the changes would result in better network service. If we want to be able to experiment with different internal flow control techniques in the internet, then we must provide a clean interface between the internal flow control protocols, and the way in which flow control information is fed back to the Hosts. We must define a relatively simple and straightforward interface by which a source - 62 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen Switch can enforce flow control restrictions on a Host, independently of how the source Switch determines just what restrictions to enforce. The way in which the Switches determine these restrictions can be changed as we learn more about flow control, but the Host interface will remain the same. It is not clear that the source quench mechanism has been generally recognized as a new sort of protocol, which bypasses the usual Network Access Protocol for the internet (IP). One reason that it may seem strange to dignify this mechanism with the name of "protocol" is that no one really knows what a source quench packet really means, and no one really knows what they are supposed to do when they get one. So generally, they are just ignored, and the "procedure" of ignoring a control packet seems like a very degenerate case of a protocol. Further, the source quench mechanism is a protocol which Host software implementers seem to feel free to violate with impunity. No implementer could decide to ignore the protocols governing the form of addresses in the internet, or he would never be able to send or receive data. Yet there is no penalty for ignoring source quench packets, although violating the flow control part of the internetting protocol seems like something that really ought to be prohibited. (We have even heard rumors of Host software implementers who have decided to increase their rate of traffic flow into the internet upon receiving a source quench packet, on the grounds that if they are receiving source quench packets, some of their traffic - 63 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen is not getting through, and therefore they had better retransmit their traffic right away.) We have spoken of a source Switch needing to be able to ENFORCE flow control restrictions, by which we mean that when a source Switch determines that a certain source Host ought to reduce its rate of traffic, the Switch will REFUSE to accept traffic at a faster rate. Proper flow control can never be accomplished if we have to rely either on the good will or the good sense of Host software implementers. (Remember that Host software implementations will continue for years after the internet becomes operational, and future implementers may not be as conversant as current implementers with networking issues). This means a major change to the IP concept. Yet it seems to make much more sense to enhance the Catenet Network Access Protocol to allow for flow control than to try to bypass the Network Access Protocol entirely by sending control information directly from intermediate Switches to a Host which is only going to ignore it. We will not discuss internal flow control mechanisms here, except to say that we do not believe at all in "choke packet" schemes, of which the source quench mechanism is an example. Eventually, we will propose an internal congestion control scheme for the internet, but it will not look at all like the source quench mechanism. (Chapters 5 and 6 of [2] contain some - 64 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen interesting discussions of congestion control in general, and of choke packet schemes in particular.) It appears that some internet workers are now becoming concerned with the issue of what to do when source quench packets are received, but this way of putting the question is somewhat misdirected. When you get some information, and you still don't know what decision to make or what action to take, maybe the problem is not so much in the decision-making process as it is in the information. The proper question is not, "what should we do when we get source quench packets?", but rather "what should we get instead of source quench packets that would provide a clear and meaningful indication as to what we should do? Does this mean that the internet Network Access Protocol should not really be a datagram protocol? To some extent, this is merely a terminological issue. There is no reason why a protocol cannot enforce congestion or flow control without also imposing reliability or sequentiality, or any other features that may unnecessarily add delay or reduce throughput. Whether such a protocol would be called a "datagram protocol" is a matter of no import. It is worth noting, though, that the Network Access Protocol of AUTODIN II (SIP), while officially known as a datagram protocol, does impose and enforce flow control restrictions on its hosts. - 65 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen The only real way for a source Switch to enforce its flow control restrictions on a source Host is simply for the Switch to REFUSE packets from that Host if the Host is sending too rapidly. At its simplest, the Switch could simply drop the packets, with no further action. A somewhat more complex procedure would have the Switch inform the Host that a packet had been dropped. A yet more complex procedure would tell the Host when to try again. Even more complex schemes, like the windowing scheme of X.25, are also possible. To make any of these work, however, it seems that a source Switch (gateway) will have to maintain Host-specific traffic information, which will inevitably place a limit on the number of Hosts that can be accessing a source Switch simultaneously. Yet this seems inevitable if we are to take seriously the need for flow control. At any rate, the need for flow control really implies the need for the existence of such limits. 2.6 Pathway Access Protocol Instrumentation Fault isolation in an internet environment is a very difficult task, since there are so many components, and so many ways for each to fail, that a performance problem perceived by the user may be caused by any of a thousand different scenarios. Furthermore, by the time the problem becomes evident at the user level, information as to the cause of the problem may be long gone. Effective fault isolation in the internet environment will - 66 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen require proper instrumentation in ALL internet components, including the Hosts. We will end this paper with a few remarks about the sort of instrumentation that Hosts should have, to help in fault-isolation when there is an apparent network problem. We have very often found people blaming the ARPANET for lost data, when in fact the problem is entirely within the host itself. The main source of this difficulty is that there often is no way for host personnel to find out what is happening within the host software. Sometimes host personnel will attempt to deduce the source of the apparent problem by watching the lights on the IMP interface blink, and putting that information together with the folklore that they have heard about the network (which folklore is rarely true). Our ARPANET experience shows quite clearly that this sort of fault-isolation procedure just is not useful at all. What is really needed is a much more complex, objective, and SYSTEMATIC form of instrumentation, which unfortunately is much more difficult to do than simply looking at the blinking lights. Some sorts of essential instrumentation are quite specific to the sort of Network Access Protocol or Pathway Access Protocol that is being used. For example, users of the ARPANET often complain that the IMP is blocking their host for an excessive amount of time. By itself, this information is not very useful, since it is only a symptom which can have any of a large number of causes. In particular, the host itself may be forcing the IMP to block by attempting to violate ARPANET flow control - 67 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen restrictions. One sort of instrumentation which would be useful for the host to have is a way of keeping track of the total time it is blocked by the IMP, with the blocking time divided into the following categories: 1) Time blocked between messages. 2) Time blocked between the leader of a message and the data of the message. 3) Time blocked between packets. 4) Time blocked while attempting to send a multi-packet message (a subset of 2). 5) Time blocked during transmission of the data portion of a packet. 6) Time blocked while attempting to transmit a datagram (a subset of 2). While this information might be very non-trivial for a host to gather, it does not help us very much in fixing the problem just to know that "the IMP is blocking" unless we can get a breakdown like this. In addition, it is useful to have those categories further broken down by destination Host, in case the blocking is specific to some particular set of hosts. - 68 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen Additional useful information has to do with the 1822 reply messages. What percentage of transmitted messages are replied to with RFNMs? with DEADs? with INCOMPLETEs? This should also be broken down by destination host. In fact, it would be useful to keep track of the number of each possible 1822 IMP-host control message that is received. When problems arise, it may be possible to correlate this information with the problem symptoms. The basic idea here should be clear -- besides just telling us that "the network isn't taking packets fast enough", host personnel should be able to tell us under what conditions the network is or is not taking packets, and just what "fast enough" means. If a host is also running an access protocol other than (or in addition to) 1822, there will be specific measurements relevant to the operation of that protocol, but in order to say just what they are, one must be familiar with those particular protocols. (Again we see the effects of particular Pathway characteristics, this time on the sort of instrumentation needed for good fault isolation.) In general, whenever any protocol module is designed and implemented, the designer AND implementer (each of whom can contribute from a different but equally valuable perspective) should try to think of anything the protocol or the software module which implements it might do which could hold up traffic flow (e.g., flow control windows being closed, running out of sequence number space, failing to get timely acknowledgments, process getting swapped out, etc.), - 69 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen and should be able to gather statistics (say, average and maximum values of the amount of time data transfer is being held up for each possible cause) which tell us how the protocol module is performing. If a protocol requires (or allows) retransmissions, rate of retransmission is a very useful statistic, especially if broken down by destination host. Hosts should be able to supply statistics on the utilization of host resources. Currently, for example, many hosts cannot even provide any information about their buffer utilization, or about the lengths of the various queues which a packet must traverse when traveling (in either direction) between the host and the IMP. Yet very high buffer utilization or very long queues within the host may be a source of performance problems. When a packet has to go through several protocol modules within a host (say, from TELNET to TCP to IP to 1822), the host should be able to supply statistics on average and maximum times it takes for a packet to get through each of these modules. This can help in the discovery of unexpected or unanticipated bottlenecks within the host. (For example, packets may take an unexpectedly long amount of time to get through a certain module because the module is often swapped out. This is something that is especially likely to happen some years after the host software is initially developed, when no one remembers anymore that the host - 70 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen networking software is supposed to have a high priority. This sort of instrumentation can be quite tricky to get just right, since one must make sure that there is no period of time that slips between the time-stamps). The offered and obtained throughputs through each protocol module are also useful statistics. In addition, if a host can ever drop packets, it should keep track of this. It should be able to provide information as to what percentage of packets to (or from) each destination host (or source host) were dropped, and this should be further broken down into categories indicating why the packets were dropped. (Reasons for hosts' dropping packets will vary from implementation to implementation). Note that this sort of instrumentation is much harder to implement if we are using datagram protocols than if we are using protocols with more control information, because much of this instrumentation is based on sent or received control information. The less control information we have, the less we can instrument, which means that fault-isolation and performance evaluation become much harder. This seems to be a significant, though not yet widely-noticed, disadvantage of datagram protocols. Host personnel may want to consider having some amount of instrumentation in removable packages, rather than in permanently resident code. This ability may be essential for efficiency reasons if the instrumentation code is either large or slow. In - 71 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen that case, it might be necessary to load it in only when a problem seems evident. Instrumentation should also have the ability to be turned on and off, so that it is possible to gather data over particular time windows. This is necessary if the instrumentation is to be used as part of the evaluation of an experiment. - 72 - IEN 187 Bolt Beranek and Newman Inc. Eric C. Rosen REFERENCES 1. "DOD Standard Internet Protocol," COMPUTER COMMUNICATION REVIEW, October 1980, pp. 12-51. 2. "ARPANET Routing Algorithm Improvements," BBN Report No. 4473, August 1980. - 73 - -------