A raw socket is a socket that allows access to packet
headers on incoming and outgoing packets. Raw sockets
always receive the packets with the packet header included
as opposed to non-raw sockets which strip the header and
only receive the packet payload.
Mac OS is the trademarked name for a series of graphical
user interface-based operating systems developed by Apple
Inc. (formerly Apple Computer, Inc.) for their Macintosh
line of computer systems.
PC is Personal computer, a computer whose original sales
price, size, and capabilities make it useful for
individuals
Mac OS works on PC. So both are diffrent.. BOth are counter
part of each other. Mac is Software and PC is hardware.
LILO is Linux Loader is a boot loader for Linux. It is used
to load Linux into the memory and start the Operating
system. LILO can be configured to boot other operating
systems as well. LILO is customizable, which means that if
the default configuration is not correct, it can be changed.
Config file for LILO is lilo.conf.
4. How would I put my socket in non-blocking mode?
Technically, fcntl(soc, F_SETFL, O_NONBLOCK) is incorrect since it clobbers all other file flags. Generally one gets away with it since the other flags (O_APPEND for example) don't really apply much to sockets. In a similarly rough vein, you would use fcntl(soc, F_SETFL, 0) to go back to blocking mode.
To do it right, use F_GETFL to get the current flags, set or clear the O_NONBLOCK flag, then use F_SETFL to set the flags.
And yes, the flag can be changed either way at will.
5. How come only the first part of my datagram is getting through?
This has to do with the maximum size of a datagram on the two machines involved. This depends on the sytems involved, and the MTU (Maximum Transmission Unit). According to "UNIX Network Programming", all TCP/IP implementations must support a minimum IP datagram size of 576 bytes, regardless of the MTU. Assuming a 20 byte IP header and 8 byte UDP header, this leaves 548 bytes as a safe maximum size for UDP messages. The maximum size is 65516 bytes. Some platforms support IP fragmentation which will allow datagrams to be broken up (because of MTU values) and then re-assembled on the other end, but not all implementations support this.
Another issue is fragmentation. If a datagram is sent which is too large for the network interface it is sent through, then the sending host will fragment it into smaller packets which are reassembled by the receiving host. Also, if there are intervening routers, then they may also need to fragment the packet(s), which greatly increases the chances of losing one or more fragments (which causes the entire datagram to be dropped). Thus, large UDP datagrams should be avoided for applications that are likely to operate over routed nets or the Internet proper.
6. How can I be sure that UDP messages are received in order?
You can't. What you can do is make sure that messages are processed in order by using a numbering system as mentioned in ``5.5 How can I be sure that a UDP message is received?''. If you need your messages to be received and be received in order you should really consider switching to TCP. It is unlikely that you will be able to do a better job implementing this sort of protocol than the TCP people already have, without a significant investment of time.
7. How can I read ICMP errors from connected UDP sockets?
If the target machine discards the message because there is no process reading on the requested port number, it sends an ICMP message to your machine which will cause the next system call on the socket to return ECONNREFUSED. Since delivery of ICMP messages is not guarenteed you may not recieve this notification on the first transaction.
8. What is the difference between connected and unconnected sockets?
If a UDP socket is unconnected, which is the normal state after a bind() call, then send() or write() are not allowed, since no destination address is available; only sendto() can be used to send data.
Calling connect() on the socket simply records the specified address and port number as being the desired communications partner. That means that send() or write() are now allowed; they use the destination address and port given on the connect call as the destination of the packet.
9. How can I write a multi-homed server?
I want to run a server on a multi-homed host. The host is part of two networks and has two ethernet cards. I want to run a server on this machine, binding to a pre-determined port number. I want clients on either subnet to be able to send broadcast packates to the port and have the server receive them.
Your first question in this scenario is, do you need to know which subnet the packet came from? I'm not at all sure that this can be reliably determined in all cases.
If you don't really care, then all you need is one socket bound to INADDR_ANY. That simplifies things greatly.
If you do care, then you have to bind multiple sockets. You are obviously attempting to do this in your code as posted, so I'll assume you do.
I was hoping that something like the following would work. Will it? This is on Sparcs running Solaris 2.4/2.5.
I don't have access to Solaris, but I'll comment based on my experience with other Unixes.
What you are doing is attempting to bind all the current hosts unicast addresses as listed in hosts/NIS/DNS. This may or may not reflect reality, but much more importantly, neglects the broadcast addresses. It seems to be the case in the majority of implementations that a socket bound to a unicast address will not see incoming packets with broadcast addresses as their destinations.
The approach I've taken is to use SIOCGIFCONF to retrieve the list of active network interfaces, and SIOCGIFFLAGS and SIOCGIFBRDADDR to identify broadcastable interfaces and get the broadcast addresses. Then I bind to each unicast address, each broadcast address, and to INADDR_ANY as well. That last is necessary to catch packets that are on the wire with INADDR_BROADCAST in the destination. (SO_REUSEADDR is necessary to bind INADDR_ANY as well as the specific addresses.)
This gives me very nearly what I want. The wrinkles are:
o I don't assume that getting a packet through a particular socket necessarily means that it actually arrived on that interface.
o I can't tell anything about which subnet a packet originated on if its destination was INADDR_BROADCAST.
o On some stacks, apparently only those with multicast support, I get duplicate incoming messages on the INADDR_ANY socket.
10. How should I choose a port number for my server?
The list of registered port assignments can be found in STD 2 or RFC 1700. Choose one that isn't already registered, and isn't in /etc/services on your system. It is also a good idea to let users customize the port number in case of conflicts with other un- registered port numbers in other servers. The best way of doing this is hardcoding a service name, and using getservbyname() to lookup the actual port number. This method allows users to change the port your server binds to by simply editing the /etc/services file.
11. How can I bind() to a port number < 1024?
The restriction on access to ports < 1024 is part of a (fairly weak) security scheme particular to UNIX. The intention is that servers (for example rlogind, rshd) can check the port number of the client, and if it is < 1024, assume the request has been properly authorised at the client end.
The practical upshot of this, is that binding a port number < 1024 is reserved to processes having an effective UID == root.
This can, occasionally, itself present a security problem, e.g. when a server process needs to bind a well-known port, but does not itself need root access (news servers, for example). This is often solved by creating a small program which simply binds the socket, then restores the real userid and exec()s the real server. This program can then be made setuid root.
12. What exactly does SO_LINGER do?
On some unixes this does nothing. On others, it instructs the kernel to abort tcp connections instead of closing them properly. This can be dangerous.
13. How can I listen on more than one port at a time?
The best way to do this is with the select() call. This tells the kernel to let you know when a socket is available for use. You can have one process do i/o with multiple sockets with this call. If you want to wait for a connect on sockets 4, 6 and 10 you might execute the following code snippet:
fd_set socklist;
FD_ZERO(&socklist); /* Always clear the structure first. */
FD_SET(4, &socklist);
FD_SET(6, &socklist);
FD_SET(10, &socklist);
if (select(11, NULL, &socklist, NULL, NULL) < 0)
perror("select");
The kernel will notify us as soon as a file descriptor which is less than 11 (the first parameter to select()), and is a member of our socklist becomes available for writing. See the man page on select() for more details.
14. Why do not my sockets close?
When you issue the close() system call, you are closing your interface to the socket, not the socket itself. It is up to the kernel to close the socket. Sometimes, for really technical reasons, the socket is kept alive for a few minutes after you close it. It is normal, for example for the socket to go into a TIME_WAIT state, on the server side, for a few minutes. People have reported ranges from 20 seconds to 4 minutes to me. The official standard says that it should be 4 minutes. On my Linux system it is about 2 minutes. This is explained in great detail in ``2.7 Please explain the TIME_WAIT state.''.
When the size of the incoming data is unknown, you can either make the size of the buffer as big as the largest possible (or likely) buffer, or you can re-size the buffer on the fly during your read. When you malloc() a large buffer, most (if not all) varients of unix will only allocate address space, but not physical pages of ram. As more and more of the buffer is used, the kernel allocates physical memory. This means that malloc'ing a large buffer will not waste resources unless that memory is used, and so it is perfectly acceptable to ask for a meg of ram when you expect only a few K.
On the other hand, a more elegant solution that does not depend on the inner workings of the kernel is to use realloc() to expand the buffer as required in say 4K chunks (since 4K is the size of a page of ram on most systems). I may add something like this to sockhelp.c in the example code one day.
** Let the system choose your client's port number **
The exception to this, is if the server has been written to be picky about what client ports it will allow connections from. Rlogind and rshd are the classic examples. This is usually part of a Unix-specific (and rather weak) authentication scheme; the intent is that the server allows connections only from processes with root privilege. (The weakness in the scheme is that many O/Ss (e.g. MS-DOS) allow anyone to bind any port.)
The rresvport() routine exists to help out clients that are using this scheme. It basically does the equivalent of socket() + bind(), choosing a port number in the range 512..1023.
If the server is not fussy about the client's port number, then don't try and assign it yourself in the client, just let connect() pick it for you.
If, in a client, you use the naive scheme of starting at a fixed port number and calling bind() on consecutive values until it works, then you buy yourself a whole lot of trouble:
The problem is if the server end of your connection does an active close. (E.G. client sends 'QUIT' command to server, server responds by closing the connection). That leaves the client end of the connection in CLOSED state, and the server end in TIME_WAIT state. So after the client exits, there is no trace of the connection on the client end.
Now run the client again. It will pick the same port number, since as far as it can see, it's free. But as soon as it calls connect(), the server finds that you are trying to duplicate an existing connection (although one in TIME_WAIT). It is perfectly entitled to refuse to do this, so you get, I suspect, ECONNREFUSED from connect(). (Some systems may sometimes allow the connection anyway, but you can't rely on it.)
This problem is especially dangerous because it doesn't show up unless the client and server are on different machines. (If they are the same machine, then the client won't pick the same port number as before). So you can get bitten well into the development cycle (if you do what I suspect most people do, and test client & server on the same box initially).
Even if your protocol has the client closing first, there are still ways to produce this problem (e.g. kill the server).
17. Why do I sometimes lose a servers address when using more than one server?
Take a careful look at struct hostent. Notice that almost everything in it is a pointer? All these pointers will refer to statically allocated data.
For example, if you do:
struct hostent *host = gethostbyname(hostname);
then (as you should know) a subsequent call to gethostbyname() will overwrite the structure pointed to by 'host'.
But if you do:
struct hostent myhost;
struct hostent *hostptr = gethostbyname(hostname);
if (hostptr) myhost = *host;
to make a copy of the hostent before it gets overwritten, then it still gets clobbered by a subsequent call to gethostbyname(), since although myhost won't get overwritten, all the data it is pointing to will be.
You can get round this by doing a proper 'deep copy' of the hostent structure, but this is tedious. My recommendation would be to extract the needed fields of the hostent and store them in your own way.
It might be nice if you mention MT safe libraries provide complimentary functions for multithreaded programming. On the solaris machine I'm typing at, we have gethostbyname and gethostbyname_r (_r for reentrant). The main difference is, you provide the storage for the hostent struct so you always have a local copy and not just a pointer to the static copy.
18. How can my client work through a firewall/proxy server?
If you are running through separate proxies for each service, you shouldn't need to do anything. If you are working through sockd, you will need to "socksify" your application.
19. How can I find the full hostname (FQDN) of the system I am running on?
Some systems set the hostname to the FQDN and others set it to just the unqualified host name. I know the current BIND FAQ recommends the FQDN, but most Solaris systems, for example, tend to use only the unqualified host name.
Regardless, the way around this is to first get the host's name (perhaps an FQDN, perhaps unaualified). Most systems support the Posix way to do this using uname(), but older BSD systems only provide gethostname(). Call gethostbyname() to find your IP address. Then take the IP address and call gethostbyaddr(). The h_name member of the hostent{} should then be your FQDN.
20. When will my application receive SIGPIPE?
Very simple: with TCP you get SIGPIPE if your end of the connection has received an RST from the other end. What this also means is that if you were using select instead of write, the select would have indicated the socket as being readable, since the RST is there for you to read (read will return an error with errno set to ECONNRESET).
Basically an RST is TCP's response to some packet that it doesn't expect and has no other way of dealing with. A common case is when the peer closes the connection (sending you a FIN) but you ignore it because you're writing and not reading. (You should be using select.) So you write to a connection that has been closed by the other end and the oether end's TCP responds with an RST.
21. After the chroot(), calls to socket() are failing. Why?
On systems where sockets are implemented on top of Streams (e.g. all SysV-based systems, presumably including Solaris), the socket() function will actually be opening certain special files in /dev. You will need to create a /dev directory under your fake root and populate it with the required device nodes (only).
Your system documentation may or may not specify exactly which device nodes are required; suggested checking the man page for ftpd, which should list the files you need to copy and devices you need to create in the chroot'd environment.)
A less-obvious issue with chroot() is if you call syslog(), as many daemons do; syslog() opens (depending on the system) either a UDP socket, a FIFO or a Unix-domain socket. So if you use it after a chroot() call, make sure that you call openlog() *before* the chroot.
22. What is the difference between read() and recv()?
read() is equivalent to recv() with a flags parameter of 0. Other values for the flags parameter change the behaviour of recv(). Similarly, write() is equivalent to send() with flags == 0.
It is unlikely that send()/recv() would be dropped; perhaps someone with a copy of the POSIX drafts for socket calls can check...
Portability note: non-unix systems may not allow read()/write() on sockets, but recv()/send() are usually ok. This is true on Windows and OS/2, for example.
First off, be sure you really want to use it in the first place. It will disable the Nagle algorithm (see ``2.11 How can I force a socket to send the data in its buffer?''), which will cause network traffic to increase, with smaller than needed packets wasting bandwidth. Also, from what I have been able to tell, the speed increase is very small, so you should probably do it without TCP_NODELAY first, and only turn it on if there is a problem.
Here is a code example,
with a warning about using it
int flag = 1;
int result = setsockopt(sock,
/* socket affected */
IPPROTO_TCP,
/* set option at TCP level */
TCP_NODELAY,
/* name of option */
(char *) &flag,
/* the cast is historical cruft */
sizeof(int));
/* length of option value */
if (result < 0)
... handle the error ...
TCP_NODELAY is for a specific purpose; to disable the Nagle buffering algorithm. It should only be set for applications that send frequent small bursts of information without getting an immediate response, where timely delivery of data is required (the canonical example is mouse movements).
24. Whats the difference between select() and poll()?
The basic difference is that select()'s fd_set is a bit mask and therefore has some fixed size. It would be possible for the kernel to not limit this size when the kernel is compiled, allowing the application to define FD_SETSIZE to whatever it wants (as the comments in the system header imply today) but it takes more work. 4.4BSD's kernel and the Solaris library function both have this limit. But I see that BSD/OS 2.1 has now been coded to avoid this limit, so it's doable, just a small matter of programming. :-) Someone should file a Solaris bug report on this, and see if it ever gets fixed.
With poll(), however, the user must allocate an array of pollfd structures, and pass the number of entries in this array, so there's no fundamental limit. As Casper notes, fewer systems have poll() than select, so the latter is more portable. Also, with original implementations (SVR3) you could not set the descriptor to -1 to tell the kernel to ignore an entry in the pollfd structure, which made it hard to remove entries from the array; SVR4 gets around this. Personally, I always use select() and rarely poll(), because I port my code to BSD environments too. Someone could write an implementation of poll() that uses select(), for these environments, but I've never seen one. Both select() and poll() are being standardized by POSIX 1003.1g.
25. Where can a get a library for programming sockets?
There is the Simple Sockets Library by Charles E. Campbell, Jr. PhD. and Terry McRoberts. The file is called ssl.tar.gz, and you can download it from this faq's home page. For c++ there is the Socket++ library which is on ftp://ftp.virginia.edu/pub/socket++-1.10.tar.gz. There is also C++ Wrappers. The file is called ftp://ftp.huji.ac.il/pub/languages/C++/. Thanks to Bill McKinnon for tracking it down for me! From http://www.cs.wustl.edu/~schmidt you should be able to find the ACE toolkit. PING Software Group has some libraries that include a sockets interface among other things. You can find them at http://love.geology.yale.edu/~markl/ping.
26. Why do I get EPROTO from read()?
EPROTO means that the protocol encountered an unrecoverable error for that endpoint. EPROTO is one of those catch-all error codes used by STREAMS-based drivers when a better code isn't available.
Not quite to do with EPROTO from read(), but I found out once that on some STREAMS-based implementations, EPROTO could be returned by accept() if the incoming connection was reset before the accept completes.
On some other implementations, accept seemed to be capable of blocking if this occured. This is important, since if select() said the listening socket was readable, then you would normally expect not to block in the accept() call. The fix is, of course, to set nonblocking mode on the listening socket if you are going to use select() on it.
27. Why does it take so long to detect that the peer died?
Because by default, no packets are sent on the TCP connection unless there is data to send or acknowledge.
So, if you are simply waiting for data from the peer, there is no way to tell if the peer has silently gone away, or just isn't ready to send any more data yet. This can be a problem (especially if the peer is a PC, and the user just hits the Big Switch...).
One solution is to use the SO_KEEPALIVE option. This option enables periodic probing of the connection to ensure that the peer is still present. BE WARNED: the default timeout for this option is AT LEAST 2 HOURS. This timeout can often be altered (in a system-dependent fashion) but not normally on a per-connection basis (AFAIK).
RFC1122 specifies that this timeout (if it exists) must be configurable. On the majority of Unix variants, this configuration may only be done globally, affecting all TCP connections which have keepalive enabled. The method of changing the value, moreover, is often difficult and/or poorly documented, and in any case is different for just about every version in existence.
If you must change the value, look for something resembling tcp_keepidle in your kernel configuration or network options configuration.
If you're sending to the peer, though, you have some better guarantees; since sending data implies receiving ACKs from the peer, then you will know after the retransmit timeout whether the peer is still alive. But the retransmit timeout is designed to allow for various contingencies, with the intention that TCP connections are not dropped simply as a result of minor network upsets. So you should still expect a delay of several minutes before getting notification of the failure.
The approach taken by most application protocols currently in use on the Internet (e.g. FTP, SMTP etc.) is to implement read timeouts on the server end; the server simply gives up on the client if no requests are received in a given time period (often of the order of 15 minutes). Protocols where the connection is maintained even if idle for long periods have two choices:
1. use SO_KEEPALIVE
2. use a higher-level keepalive mechanism (such as sending a null request to the server every so often).
28. What are the pros/cons of select(), non-blocking I/O and SIGIO?
Using non-blocking I/O means that you have to poll sockets to see if there is data to be read from them. Polling should usually be avoided since it uses more CPU time than other techniques.
Using SIGIO allows your application to do what it does and have the operating system tell it (with a signal) that there is data waiting for it on a socket. The only drawback to this soltion is that it can be confusing, and if you are dealing with multiple sockets you will have to do a select() anyway to find out which one(s) is ready to be read.
Using select() is great if your application has to accept data from more than one socket at a time since it will block until any one of a number of sockets is ready with data. One other advantage to select() is that you can set a time-out value after which control will be returned to you whether any of the sockets have data for you or not.
29. How can I force a socket to send the data in its buffer?
You can't force it. Period. TCP makes up its own mind as to when it can send data. Now, normally when you call write() on a TCP socket, TCP will indeed send a segment, but there's no guarantee and no way to force this. There are lots of reasons why TCP will not send a segment: a closed window and the Nagle algorithm are two things to come immediately to mind.
Setting this only disables one of the many tests, the Nagle algorithm. But if the original poster's problem is this, then setting this socket option will help.
A quick glance at tcp_output() shows around 11 tests TCP has to make as to whether to send a segment or not.
As you've surmised, I've never had any problem with disabling Nagle's algorithm. Its basically a buffering method; there's a fixed overhead for all packets, no matter how small. Hence, Nagle's algorithm collects small packets together (no more than .2sec delay) and thereby reduces the amount of overhead bytes being transferred. This approach works well for rcp, for example: the .2 second delay isn't humanly noticeable, and multiple users have their small packets more efficiently transferred. Helps in university settings where most folks using the network are using standard tools such as rcp and ftp, and programs such as telnet may use it, too.
However, Nagle's algorithm is pure havoc for real-time control and not much better for keystroke interactive applications (control-C, anyone?). It has seemed to me that the types of new programs using sockets that people write usually do have problems with small packet delays. One way to bypass Nagle's algorithm selectively is to use "out-of-band" messaging, but that is limited in its content and has other effects (such as a loss of sequentiality) (by the way, out-of- band is often used for that ctrl-C, too).
So to sum it all up, if you are having trouble and need to flush the socket, setting the TCP_NODELAY option will usually solve the problem. If it doesn't, you will have to use out-of-band messaging, but according to Andrew, "out-of-band data has its own problems, and I don't think it works well as a solution to buffering delays (haven't tried it though). It is not 'expedited data' in the sense that exists in some other protocols; it is transmitted in-stream, but with a pointer to indicate where it is."
30. How come select says there is data, but read returns zero?
The data that causes select to return is the EOF because the other side has closed the connection. This causes read to return zero.
31. How do I send [this] over a socket?
Anything other than single bytes of data will probably get mangled unless you take care. For integer values you can use htons() and friends, and strings are really just a bunch of single bytes, so those should be OK. Be careful not to send a pointer to a string though, since the pointer will be meaningless on another machine. If you need to send a struct, you should write sendthisstruct() and readthisstruct() functions for it that do all the work of taking the structure apart on one side, and putting it back together on the other. If you need to send floats, you may have a lot of work ahead of you. You should read RFC 1014 which is about portable ways of getting data from one machine to another (thanks to Andrew Gabriel for pointing this out).
See that send()/write() can generate SIGPIPE. Is there any advantage to handling the signal, rather than just ignoring it and checking for the EPIPE error? Are there any useful parameters passed to the signal catching function?
In general, the only parameter passed to a signal handler is the signal number that caused it to be invoked. Some systems have optional additional parameters, but they are no use to you in this case.
My advice is to just ignore SIGPIPE as you suggest. That's what I do in just about all of my socket code; errno values are easier to handle than signals (in fact, the first revision of the FAQ failed to mention SIGPIPE in that context; I'd got so used to ignoring it...)
There is one situation where you should not ignore SIGPIPE; if you are going to exec() another program with stdout redirected to a socket. In this case it is probably wise to set SIGPIPE to SIG_DFL before doing the exec().
33. Why do I keep getting EINTR from the socket calls?
This isn't really so much an error as an exit condition. It means that the call was interrupted by a signal. Any call that might block should be wrapped in a loop that checkes for EINTR, as is done in the example code .
34. What are socket exceptions? What is out-of-band data?
Unlike exceptions in C++, socket exceptions do not indicate that an error has occured. Socket exceptions usually refer to the notification that out-of-band data has arrived. Out-of-band data (called "urgent data" in TCP) looks to the application like a separate stream of data from the main data stream. This can be useful for separating two different kinds of data. Note that just because it is called "urgent data" does not mean that it will be delivered any faster, or with higher priorety than data in the in-band data stream. Also beware that unlike the main data stream, the out-of-bound data may be lost if your application can't keep up with it.
35. How do I convert a string into an internet address?
If you are reading a host's address from the command line, you may not know if you have an aaa.bbb.ccc.ddd style address, or a host.domain.com style address. What I do with these, is first try to use it as a aaa.bbb.ccc.ddd type address, and if that fails, then do a name lookup on it. Here is an example:
/* Converts ascii text to in_addr struct.
/* NULL is returned if the
address can not be found. */
struct in_addr *atoaddr(char *address) {
struct hostent *host;
static struct in_addr saddr;
/* First try it as aaa.bbb.ccc.ddd. */
saddr.s_addr = inet_addr(address);
if (saddr.s_addr != -1) {
return &saddr;
}
host = gethostbyname(address);
if (host != NULL) {
return (struct in_addr *) *host->h_addr_list;
}
return NULL;
}