Introduction, addressing scheme, architecture, protocols, client/server ports. Email: addresses, how to find them using finger, whois, X500, knowbot. Mailing lists, List servers: ListServ, Majordomo, UseNet. Email viewers: trn (for Unix), Trumpet (for Windows). UNIX's Internet Utilities: Telnet, FTP. Internet File Types, Archiving & Compression. Search Engines: Archie, Gopher, WAIS. World Wide Web Hypertext Document System.
The Internet is a world-wide network of networks. It comprises 20,000 networks with a total of 3,000,000 host computers growing at 100,000 hosts per month. You can connect your computer to the Internet via a permanent phone line or the dial-up phone network. Either way, your computer can be connected to the Internet either as a host within the Internet or as a terminal of a host owned by an access provider. I wonder if there is also a way of connecting free via AMSAT through the Amateur packet radio network?
For a permanent direct high-speed connection, the cost is prohibitive for all but large establishments. For dial-up, the connection rental is about £6.50 a month + plus BT's normal local-call rate charges for the time you spend on the phone line. Connection via packet radio is free. Almost all services on the Internet are free — world-wide. They are mostly provided by the American government departments, colleges and universities, voluntary organisations and as good PR exercises by very large commercial organisations like IBM.
All host computers on the Internet are in reality UNIX boxes. To use Internet services effectively and efficiently you need to be familiar with UNIX. Each host on the Internet has a unique 32-bit address:
Class A and Class B networks can be divided into subnets within their organisations. Users who need more than the 256 hosts that a Class C network can provide can have Class C supernets which comprise a group of consecutively addressed Class C networks. Some hosts — especially routers — can have more than one Internet address. This is because they are connected to more than one network. All service-providing hosts (servers) on the Internet also have names whose formats conform to the Domain Naming System as follows:
These are translated into numeric host addresses by reference to name servers. Each name server maintains a list of the names and addresses of hosts in its vicinity. Given a host name, it returns the address.
The Internet is not a single world-wide network of computers. It is a network of networks-of-computers. The things that link the networks together to form the Internet are called routers. They themselves are hosts. They simply contain the necessary routing software. An example of how the Internet is made up is shown below.
Originally the three hosts shown were probably the main servers for independent company and campus local area networks and a conventional host with a bunch of dumb terminals. Then, when the Internet arrived, they were equipped with the necessary routing and Internet servicing software and linked together to form part of the Internet backbone.
Dumb terminals access Internet services through the UNIX shell of the host to which they are connected. They can also use various items of client software dedicated to specific services like email. LAN-based and dial-up PCs can work in the same way, but PCs have the advantage of being able to run client software directly. Client software for a PC is usually more advanced than what is available to a dumb terminal on the local UNIX host. A UNIX workstation on a LAN has all the UNIX facilities of the host router and also has its own Internet address. It can therefore use all the UNIX Internet client software and indeed the latest Xwindows-based GUI client software.
The local area networks (LANs) can be of different physical types like token ring or Ethernet. They can be different types of Ethernet — thicknet, thinnet or twisted pair & hub. Networks of the same type can be linked directly by 'bridges'. They can be running any of the popular LAN protocols — Novell, 3Com, Banyan, DECnet, SNA and TCP/IP.
Networks of the same physical type but running different protocols can be linked by gateways. However, gateways are application-specific in that the way you convert between protocols is different for email than it is for a terminal session.
However, the Internet backbone always uses the Internet Protocol (IP) and the hosts are normally connected via fast digital trunks. Small dial-in UNIX hosts are connected by the Point-to-Point Protocol, PPP.
The Internet Protocol (IP) is the way in which data is sent from one host to another across the Internet. All data to be sent is split up into packets of from 200 to 2000 bytes (1536 bytes is the traditional standard size for a packet). Each packet is headed with the address of the host it is bound for, and also the return address of the host that sent it. The format of an IP packet is show below:
Each packet then moves across the Internet like a mail item being forwarded from host to host en route until it reaches its destination host. Each en-route host must be able to understand IP in order to be able to forward each packet to the next waypoint on the fastest permissible route to its final destination.
IP tries its best to deliver packets. However, 1% of packets may not reach their destinations. They get lost due to transmission errors. The Internet's Transmission Control Protocol (TCP) is a higher level protocol than IP. It runs on top of IP. It provides what looks like a dedicated connection or virtual circuit between client and server across the Internet.
TCP acts like a certified mail system ensuring that all packets sent are in fact received. The sending TCP numbers each packet before it is sent. The receiving TCP tells the other what it has received and what it has not. Any lost packets are then re-sent.
Unlike the X25 protocol which validates the integrity of all packets at each en-route data switch, TCP operates only at the end of the virtual circuit. The routers take no part in the TCP protocol.
TPC is for interactive work. Other high-level protocols such as FTP are used on the Internet also. These are discussed later.
Inside any given host there will be a number of different servers running simultaneously. Each server provides a particular kind of user service or application. Each server has a port number.
A client program communicates with its remote server by sending it a request. The request is broken down into packets and sent across the Internet to the server host. There the packets are re-assembled into the original request. The request is then put onto a queue to await service.
As soon as the server concerned is free it processes the request and produces the response. The response is then broken down into packets by the TCP and sent to the client over the Internet. The client's TCP then re-assembles the packets into the response and delivers it to the client program.
Each application server has a permanently-assigned port number which is universally known throughout the Internet. Each time a user starts a client program it is assigned an arbitrary port number at the time. A port number is in effect the address of a piece of software. It is where you send data to it.
Each request from a client contains the port number of its server on the remote host together with its own port number. When the request gets to the other end, the port number ensures that it is given to the right server. After the server has processed the request and produced the response, it heads the response with the port number of the client that sent the request. This ensures that when it gets to the sender's host, the sender's host gives it to the right client. In the diagram, Port 13 is used as an example. It is the universal port number of the telnet server.
Companies and other organisations (domains) have Internet addresses such as ebs.co.uk. Furthermore, individual computers (both hosts and workstations) within a domain can also have individual identifiers. These can be names like sharon, tracy and eustace as shown below:
Likewise, a person within a domain can have an individual identity. It is quite separate from that of his workstation. It is his mailbox address, examples of which are shown above for Ruby and me. Everyone's mailbox is usually held on the local host on which a mail daemon like smail that uses the Simple Mail Transfer Protocol SMTP which requires the host to be on all the time waiting for electronic mail to arrive off the Internet.
A mailbox is an ordinary text file on the UNIX host. The individual messages within it are separated by a separator which is usually a line containing four Ctrl-A characters. These look like smiley faces in the PC8 character set. As each new message comes in it is appended to the mailbox file.
To send email you need a mail client like Berkeley mail, xmail, pine, elm or Eudora (PC Windows). The last four allow you to write, queue and send outgoing mail. They also allow you to select from a list and read received mail.
Mail robots can respond to certain email messages automatically. They are used for such things as responding to requests for files or news.
MIME - Multipurpose Internet Mail Extensions. This is a standard for encoding binary (exe, dat, image, video, sound, etc.) files as 7-bit text to be sent as email messages and decoded by the recipient. Pine and Eudora can display/play MIME messages.
Mail Sorters look for key words on the 'From' and 'Subject' lines of each message as it is received. They put each message into a sub-mailbox according to sender and subject categories. I think building a pair of index files would be slicker. UNIX Mail sorting programs procmail and delivermail require shell scripts to be written. Eudora and Pine have built-in mail filters that do the same thing.
RFC822 is the standard electronic mail addressing system used on the Internet. Internet mail addresses comprise two parts:
user name: (eg Robert.J.Morton or rob.morton)
domain id: (eg ebs.co.uk)
This is written as email@example.com
That's all there is to it.
RFC822's arch rival is the X400 standard devised by the CCITT.
Some private email service which are connected to the Internet use X400. Among these, X400 is becoming more and more widely used. The most relevant of the labyrinth of pre-defined formal fields that make up an X400 email address are as follows:
|Q||Generation qualifier (Jr III etc.)|
|A||Administration domain name|
|P||Private domain name|
|DD||Domain Defined Attributes|
An X400 email address is of the form:
/S = Morton
/G = Robert
/I = J
/O = Eastern Business Systems
/A = CompuMail
/P = EBS
/C = UK
/DD = rob
The order of the slash-separated items does not matter. The best way to find the exact form of an X400 subscriber's email address is to phone them and ask them to send you a message. Get their address off the 'From' line on the message you receive.
And there are more...
Many email systems manufacturers, private email providers and indeed user organisations have their own proprietary addressing systems. This means that some people's email addresses may become very large and complicated. For instance, AT&T Mail provides gateways to companies' internal mail systems. So you can have internal sub-domains appended to the individual's username eg:
See pages 111 - 116 for private email services and their addressing peculiarities.
However, if you specify one of the users by name, eg: finger rob, it returns all the above plus which directory they are in, what project they are working on, and what their plan is. The project details are kept in a file called .project and the plan is given in a file called .plan. Only the first line of the .project file is displayed but up to 10 lines of the .plan file is displayed.
You can finger a remote host by specifying its Internet name after the finger command eg: finger @ebs.co.uk. This will return who is logged in at the moment at ebs.co.uk. And you can finger an individual at a remote host viz: finger firstname.lastname@example.org. Not all remote hosts allow you to finger their users.
whois -h whois.ebs.co.uk rob
The commonest X500 service is called FRED (FRont End to Directories). You use it by telnetting to wp.psi.com or wpl.psi.com and logging in as FRED. If you want to find the email address of Pierre Petit whom you know works at some French university you aren't sure of, you then type in something like:
whois Pierre Petit -org *.ac -geo @c=FR
The -org switch takes a regular expression for the organisation name. The -geo switch takes a regular expression for the domain.
telnet info.cnri.reston.va.us 185
The 185 is the port number of the Knowbot server. When you get the prompt, just type in the person's name and wait. This can be several minutes. Knowbot has access to many private email service directories that the X500 systems do not, so it is worth trying.
An email mailing list is associated with a discussion topic. Any one person on a list can send a message to the list which then re-mails it to everyone else who is currently on the list.
A manual list is run and maintained by a human being. To get on to, send a message to, and get off a manual list called eg save-the-whales at ebs.co.uk, send the following messages respectively:
Please add me to the save-the-whales list.
Subject: Whale counts in the Atlantic
Blah blah blah.
Please remove me to the save-the-whales list.
An automatic list is run by a list management program.
A clunky IBM mainframe automatic list management program. To get on to, send a message to, and get off an email list called ECO-L, send the following messages respectively:
SUB ECO-L Rob Morton
Subject: Global Warming
Blah blah blah.
To contact the human life-form in charge of the an automatically run list, use the address eg
See p120 for more.
|To subscribe, send:||subscribe explosive cargo|
|To unsubscribe send:||unsubscribe explosive cargo|
|The human owner is:||owner-Majordomo@world.std.com|
|comp||Topics relating to computers (lots of fairly meaty discussions)|
|sci||Topics relating to one of the sciences (also fairly meaty)|
|rec||Recreational (sport, art, hobbies etc.)|
|soc||Social (both sociological issues and just plain socialising)|
|news||Topics to do with NetNews itself|
|misc||Topics that do not fit anywhere else|
|talk||Long arguments — frequently political|
Each news group contains subdivisions, the lowest of which contains the actual current articles on the topic concerned. All the main groups are, in theory, of world-wide interest. However, regional subdivisions exist — especially for such topics as odd bits of hardware for sale. Geographic division identifiers are things like: world, na = North Americas, usa = United States, can = Canada, uk = United Kingdom, ne = New England, ba = Bay Area (California).
You subscribe and unsubscribe to any news group you wish at any time. You can subscribe to many when you're slack and cut down when busy. You can read articles in news groups you are currently subscribed to. You can respond to them by emailing their authors directly or by sending a response to the relevant news group. You can also submit articles yourself. Most news groups have a human moderator who filters out what, in his perception, are cranky responses and articles.
The technical set-up of the UseNet system is shown below:
The UNIX trn and Windows Trumpet news clients provide you with the means to subscribe and unsubscribe to news groups, read and respond to articles, and originate and submit articles of your own. They also enable you to selectively reject articles on certain topics or by specified authors.
Articles are usually text files, but do not have to be. Articles can be in the form of GIF and JPEG image files, MPEG movie-clip files, executable files and data files. They can also be all these things packed into a UNIX 'shar' (shell archive) file which can also contain direct UNIX commands to your host. Beware: such could be a 'Trojan horse'. If you get an article in the form of a 'shar' file, use a 'shar' sanitizer on it.
If you are on a UNIX host or workstation coupled to the Internet, the 'telnet', 'rlogin' (remote Login) and 'rsh' (remote shell) commands allow you to enter direct UNIX commands on a remote host as if you were one of its local users. To 'telnet' a remote host you enter:
% telnet dellboy.ebs.co.uk
Trying 240.197.016.001 ... Connected to ebs.co.uk
Escape character is '^]'.
System V UNIX (ebs.co.uk)
Terminal type (default VT100):
If the remote host does not echo what you type, you can turn on local echo with Ctrl-E. To close your session on the remote host, type the escape character in this case ^] (meaning you hold down the control key and press the right-hand square bracket key). If this does not work straight away, press the RETURN key. You will then get through to your local telnet prompt. Type quit. Telnet then confirms that it has disconnected from the remote host.
Some hosts operate as IBM 3270 servers. Instead of the UNIX command line, they present you with screens containing fields like a form that you fill in. For these systems use 'tn3270' instead of telnet.
If you are on a PC, you can get a Windows-based TCP/IP package called Chameleon. This allows you to choose a host to log on to from a list box by clicking the host name. You then see the login prompt from the remote host and go on from there as above. It also supports 'tn3270'.
Where a group of hosts all share the same set of users 'rlogin' can be used instead. This does not require you to login to remote hosts as this is done automatically from user-details in a common system file. If you are a registered user on unrelated hosts, you can set up your details in a '.rhosts' file so that you can use 'rlogin'. See page 177.
% rlogin dellboy.ebs.co.uk
Last login: 14:30:28 Fri 08Jan96 from constance
System V UNIX (dellboy.ebs.co.uk)
Terminal type (default VT100):
~. (The ~. (tilde dot) is rlogin's escape sequence)
If all you want to do on the remote host is execute a single command you can use 'rsh'. If on your own machine you are known as 'rob' and on the remote machine you are known as 'robboge', you log on as follows:
% rsh rob -l robboge ls -R
Once the command has been executed, you are back on your own local machine's command line.
FTP enables you to copy files between your own local host and any other host on the Internet. If you are on a PC you must connect with your local host in the usual way as a terminal to conduct the FTP session.
If you want to transfer any files you have acquired from around the world to your PC you must download them using Kermit or zmodem. An FTP session is of the following form:
% ftp dellboy.ebs.co.uk
Connected to dellboy.ebs.co.uk.
220 ebs FTP server (Version 4.1 8/7/95) ready.
Name (dellboy.ebs.co.uk): rob
331 Password required for rob.
230 User rob logged in.
ftp> get README README.TXT
150 Opening ASCII mode data connection for README (12696 bytes)
226 Transfer complete.
local: README remote: README.TXT
12979 bytes received in 28 seconds (0.44 Kbytes/s)
The commands you can use at the 'ftp>' prompt are as follows:
|cd rdirname||changes the directory to rdirname on remote host|
|lcd dirname||changes the directory to dirname on local host|
|dir pattern||lists files in current directory|
|cdup||changes up to next higher directory|
|asc||set to transfer files in ASCII mode (text files)|
|bin||transfer files in binary mode (images, data etc)|
|get rcrap crap.txt||copy file rcrap on remote to crap.txt on local|
|put crap.txt rcrap||copy file crap.txt on local to rcrap on remote|
|del rcrap||delete the file rcrap on the remote host|
|mget pattern||get or put groups of files in remote host's|
|mput pattern||current directory whose names match pattern|
|mdel pattern||delete files on remote whose names match pattern|
|prompt||stop Yes/No prompts at each file for mget et al|
|quit||disconnect and get back to local's command line|
All the other stuff in the session are messages telling you what is going on. The 3 digit number before each message tells 'ftp' what is going on. FTP clients for PC Windows exist but are slow due to their insistence on loading lots of directories at start of each session. Host-to-host file copying can also be done with the 'rcp' UNIX command. This has the same conditions and restrictions as 'rlogin' and 'rsh'. You have to be known and authorised by each host you deal with. See p203.
The Internet supports two fundamental kinds of file transfer: ASCII and binary. FTP automatically translates from one ASCII format to another if the sending and receiving hosts use different text file formats. Binary files are always transferred exactly as they are. A sample of different types of text and binary files are:
|TXT||plain English, French, Spanish, German...|
|C||program source files in C, Pascal etc..|
|INI||program initialisation and control files|
|PS||postscript printer control files|
Plus any other type of file containing ASCII characters.
|GIF||Graphics Interchange Format|
|JPEG||Joint Photographic Experts Group|
|MPEG||Moving Photographic Experts Group|
|EXE||Executable program files|
|DAT||program data files|
|ARC||archive files, & ZIP - compressed files|
You can send files as they are, but for large files it is better to compress them before transmission to reduce the amount of data to be transferred and hence network time. To send large numbers of files it is also better to compile them into a single archive file before compressing and transmitting them. The whole process of archiving and compressing is shown below. You reverse the process at the other end.
|compress||classic UNIX compression utility|
|zcat||views compressed UNIX files without de-compressing|
|tar||old UNIX disc to tape archiver|
|cpio||a re-invention of the above wheel|
|pax||latest UNIX archiver/compressor|
|PKZIP||archiver and compressor for PCs|
|gzip||Free Software Foundation compressor|
|gcat||their compressed file viewer|
You can get both text and binary files by email by sending a request to an FTP mail server as follows:
FTP email@example.com uuencode
The uuencode option converts binary files into a 7-bit text form that avoids the ASCII control characters 0 - 31. When you get the file from your mailbox you have to uudecode it to get the original binary.
Archie is a server demon that searches the Internet for software. Archie can be contacted by telnet, client or email. The only sensible way to use Archie is to email him with a search request. He will then do the search and leave the results in your mailbox. To search the Internet for a program file called mktr, send the following email message:
Archie then searches all software directories on all the servers on the Internet for occurrences of this file name. He then returns the list to you as an email message which he can post to your mailbox. As well as the program files themselves, servers also hold text files that describe the programs. You can ask Archie to search the Internet for occurrences of the name of the program (or any other string for that matter) within the program description files by sending an email message like:
This asks Archie to search all program description files on the Internet for all occurrences of the name marketeer. Both the 'prog' and 'whatis' commands take a string argument that is assumed to be a UNIX regular expression. This allows you to do things like:
|prog [0-9]||searches for filenames containing digits|
|prog ^[a-z]||or filenames containing no small letters|
|prog ^birdie.*txt$||This causes Archie to search for filenames that begin with birdie and end with txt. The ^ ties birdie to start of filename, $ ties txt to end of the filename, .* means there can be any number of any type of characters in between.|
As well as the 'prog' and 'whatis' commands as shown in the above email request messages, Archie will respond to other commands as follows:
|compress||makes Archie compress his reply file before sending it to you as an email message.|
|path||lets you specify a directory path on your host where you would like Archie to put his reply file (assuming you don't want him to put it in your normal incoming email directory)|
|servers||makes Archie send you an up-to-date list of Archie servers on the Internet|
|help||returns the help text for using Archie by email.|
|quit||ends your email request to Archie|
Gopher finds documents on the Internet using menus. Menus can contain 4 types of item: other menus, search items, telnet items and files:
You can 'telnet' gopher, but it is better to use a gopher client program. HGOPHER is a good Windows client written in England by Martyn Hampson and is free. You go trogging through the menus until you get to one listing the files you want. Select the file you want from the menu. Gopher then starts displaying it. Press Q to stop it. Gopher then asks if you want it copied (s) or mailed (m) to you. You can only use (m) if you are 'telnetted'. If you accessed a UNIX Gopher via dial-in from a PC, you can press return and then D to download the file to your PC. Gopher can download any file in its menu whether 'txt' or binary. The UNIX Gopher controls are shown below:
|Enter||select current menu item (where cursor is)|
|u||go back to previous menu (same as left arrow)|
|+||move to next menu page|
|-||move back to previous menu page|
|m||go back to main menu|
|digits||go to the numbered menu item|
|/||search menu for the string that follows it|
|n||search for next match|
|q||quit - leave Gopher|
|>=||get description of current item|
|a||add current item to your bookmark menu|
|A||add this whole menu to your bookmark menu|
|v||view bookmark menu|
|d||delete current bookmark menu|
|m||mail current file to your mailbox|
|s||save current file (not for telnet)|
|p||print current file (not for telnet)|
|D||download current file|
When you come across a menu item you may want to go back to, mark it and then hit 'a' (lower case) to put it on your own bookmark menu. You can build a bookmark menu of all your items of interest as you are trawling through the Internet. Veronica is an index of all Gopher menus which can be accessed on the Internet.
This is implemented on a CM5 Connections Machine — a neural network supercomputer built by a company called Thinking Machines. Apple and Dow Jones are also involved in WAIS. Archie and Gopher only search filenames and titles for a single given string. WAIS searches the actual contents of files for occurrences of sets of strings. Eg. if you set the search string as "florida pie", it will search documents for occurrences of the words "florida" and "pie". It will capture them whether they appear together as "florida pie" or separately. You can 'telnet' WAIS to get into its absolutely horrible UNIX command-line interface. When you 'telnet', you first get a list of about 500 WAIS servers. Press
|o||if required to set WAIS display options (see page 266)|
|w||to enter your search words|
A Search Results Menu appears listing databases, each with a relevance rating from 0-1000. Mark the ones you want as follows:
|j||to move down the list|
|k||to move back up the list|
|J||page down the list (Ctrl-V and Ctrl-D also do this)|
|K||page back up the list (Ctrl-U also does this)|
|/name||search for name within the menu list (can be a partial name)|
|item N°||positions cursor at a menu list item (ie a database).|
|.||shows a short description of the database (or hit space bar)|
|u||mark the database to use for part of your search.|
|s||return to the sources menu where now are shown the databases you have just marked.|
Navigate through this list in the same way as above, using the space bar to select the ones you finally decide to search on. An asterisk appears next to selected databases. Space bar can also de-select an already selected one. Then press w to enter your key words again as before. You then get a list of files. To get them you must then use anonymous FTP to get them to your own Internet host and then if necessary download them to your PC. But for short text items you can type:
|.||to display the contents of the file|
|m||to mail the file to you|
|s||to return for another search session|
|q||to quit WAIS|
You can get to WAIS access points in Gopher Menus and the World-wide Web. It's best however to use the client software package WinWais written by the United States Geological Survey. To use this you:
Hypertext Document System
The World-Wide Web allows you to hop across hypertext links between a vast number of different topics in all the databases in all the WWW servers all over the world as depicted below.
The World-Wide Web's hypertext files can have multimedia file links embedded within them. These can be links to still images files (GIF or PEG), sound files or movie clips (in MPEG files). Information is transferred from the server's hypertext files to the client's browser program using the hypertext transfer protocol http. Image, sound and video files are very large, even for a small image or movie clip. So they take a long time to arrive across the Internet. Text is fast. If you are doing a serious web-walk, it is therefore best to stick to text only until you have got really close to what you want.
If you have suitable client software on your PC such as Mosaic, you can walk the web quite easily. It is a graphics package with Windows, Xwindows and Mac versions. It was written by the National Center for Supercomputing Research in the USA and is free. It is full of bugs, but is OK to use. Improved versions are emerging all the time. But because it is graphical and displays all images as they come, it is very very very slow — so slow as to be unusable on anything below a Pentium PC or a SUN or SPARC-type UNIX workstation.
It is probably better to 'telnet' to a WWW server and invoke Lynx. This is a character based hypertext browser. It may look frumpy, but it is fast. Hyperlinks to multimedia items are indicated by [IMAGE]. If you want to see the image or run the movie clip you click on the [IMAGE].
Some of the hyperlinks in WWW documents lead to Gopher Menus and WAIS. WWW can launch these for you but it is better to note them and access them directly later through your Gopher or WAIS client software.