Terminal-based high precision search program for this site.

Before the pandemic of paranoia set in during the late 20-teens, there was, in the position above, an embedded Java search applet providing high-precision searches of this site. Because browsers no longer support Java and never being sure how long Java's replacements for web-based programming will last, I have substituted the original applet with a robust command-line program written in 'C'. It is slightly more trouble in that you have to download it, compile it and run it in a terminal on your com­puter. Notwithstanding, the procedure for doing this is short and simple.

  1. Open a terminal by keying:

The easiest and most error-free procedure is to mark and copy each emboldened command in turn from the following list and paste it into your terminal, then press the 'Enter' key.

  1. Download and unzip the search program by entering the commands:
    wget http://robmorton.20m.com/software/C-programs/search.zip
    unzip search.zip

If you are running 64-bit Linux, skip to step 5. If you are running any other version of Linux, BSD or System V you may need to compile the source file 'search.c' to create a compatible version of the file 'search'. Examine the source code to verify what the program does and that it contains no functionality that could possibly 'blow up' your com­p­uter. If you are running any other operating system, you can change the source code to suit your C-compiler. You can also alter the browser invocation command at the end of the listing to display your results in a different browser if neces­sary, although the presentation may not be as neat.

  1. compile the program with the terminal command:
    gcc search.c -o search
    This generates a new version of the executable file: 'search'.

  2. Delete the files search.c and search.zip from your home directory:
    rm search.c search.zip

  3. Make the file 'search' executable by entering the command:
    chmod +x search.

  4. To do a search, enter the command:
    ./search poverty welfare
    as shown in the terminal illustration above, substituting the sample key­words 'poverty' and 'welfare' for the keywords you wish to search on.

A results page will be displayed in a browser window, from which you can access the relevant web pages. The Firefox browser is invoked because this website only displays completely correctly in Mozilla-based browsers. From now on, when you come back to use the search engine again, steps 1 and 6 are all you need. Enter Ctrl-C on the command line to terminate the program.

About the Original Java Search Engine Applet

You can still use this Java version of the search engine applet in pre-2017 versions of the popular browsers or by downloading the appropriate '.jar' file and running it as a stand-alone program according to the instructions contained herein.

Image of the front panel of the embedded search engine applet.
How to use itHow it searchesTechnical DetailsNot working?

How to Use This Search Facility

  1. Take some time to think carefully about what you are looking for. Get a men­tal picture of the notions and concepts involved.

  2. Try to think of a single word which describes what you are looking for as comp­letely and as exclusively as possible. You may enter further keywords to limit the scope of your search. Up to 16 words can be handled but 3 is a sensible maximum.

  3. Enter this word (or words) in the keyword entry field of the search engine app­let above. Then click the 'Search' button.

Note: upper and lower case versions of a letter are regarded as the same letter. Separate multiple keywords with spaces, commas or both. The 'Clear' button clears the entry field. Accented letters are regarded as their equivalent without an accent for search purposes.

If the first keyword is found, the title, description and URL of the first relevant docu­ment then appear in the applet's main display area, otherwise a message appears saying that the keyword could not be found. Use the 'Next' and 'Prev' buttons to scan up and down the list of relevant documents.

  1. Once you have found a document you would like to view, click the 'View Docu­ment' button. The full document will then be fetched and displayed in a separ­ate tab in your browser's window. This leaves the search applet run­ning so that you can return and select another document in the retrieved shortlist.

When you have finished reading the document, cancel its browser tab.

NOTE: When the applet first starts, it connects to the server and downloads the site's keyword index. This can take anything from a split second to about half a min­ute, depending on line speed and Internet traffic levels. If any problems occur dur­ing the downloading of the index or a document, an 'exception' message ap­pears on the message line in red. This states the name of the Java method in which the problem occurred and the type of Java 'exception' (what most programmers used to call an error) which occurred. I would appreciate your reporting to me any such occurrence by email. Thank you.

How it Searches

This search engine applet searches only within the domain of this web site. It sear­ches for given keywords within its keyword index.

The index is built off-line by an indexer. The indexer collects all the keywords listed within the keyword meta tag of each HTML file at this web site. It then sorts them into alphabetical order, keeping track of which files each occurred in. It does not extract keywords from the body of the file, i.e. the textual content of the document.

On finding a keyword in the index, the search engine applet looks up the relative URL of the first HTML file in which that keyword occurred. It then retrieves the title and description from this file. It gets the title from between the file's <title> and </title> tags. It gets the description from the 'content' part of the file's 'description' meta tag. It then displays these in the applet's own window area.

When the user clicks the 'next' button, the applet retrieves and displays the titles and descriptions of the other files in the list one by one.

Multiple Keywords

If you enter more than one keyword for a given search, the search engine applet proceeds as follows. A full shortlist of HTML documents is retrieved for the first key­word. The first keyword thus always determines the length of the shortlist. It is the primary criterion for the search.

Each HTML document in the shortlist is then ranked according to how many of the subsequent keywords also appear in its keywords meta tag. The more keywords it contains, the higher its rank. The shortlist of relevant web pages is then re-ordered according to rank. The higher a document's rank, the closer it appears to the begin­ning of the shortlist.

Why Meta Tags?

Why does the indexer extract keywords only from meta tags and not from the body or text of the document itself?

Because at least half the keywords a user will think of when looking for information on a given subject will not actually appear in the content of the document they are looking for. The content will appear in the form of phraseology, which far more powerfully expresses the notions concerned than would the large keywords thought of by the user.

Conversely, many large and specific words — potential keywords — appear in the body (text) of the document. While some may correctly be key to the subject of the document, many may not. Many of them may have their legitimate part to play in the text, but do not convey what the document is essentially about. An automatic indexer would blindly include them, whether they be relevant or not to the purpose of finding the right document for the user of a search engine.

Using the 'keyword' meta tag gives the human indexer full control over what key­words his document will be indexed under. This results in search engine listings in which the documents are 100% relevant to the subject matter being sought.

Why A Local Search Engine?

Why have a local in-site search engine at all? Why not simply let the user find stuff on this site using the major public search engines?

It is a sad fact that when technology provides a way, bureaucracy takes it away again. Once upon a time, when the Internet was essentially an academic facility, search engines simply indexed what was there. If it existed, users could find it. Not so now. And increasingly not so. The reason is that the Internet has been all but taken over by commercial interests. Consequently, what was a perfectly workable system has been driven completely pear-shaped by petty self-interest.

To attract potential custom to their sites, commerce has employed underhanded tactics like padding their keyword meta tags with false attractors. In other words, a commercial site selling trucks or modems will put keywords like 'sex' and 'erotic' simply to attract people to the site, even though their site contains nothing erotic or sexy.

This has become a problem for the major Internet search engines. Consequently, they have applied various rules in an attempt to combat this abuse of meta tags. Some have started to ignore keyword meta tags, extracting keywords only from content — the body of the document. Others exclude documents whose keyword meta tags contain any keyword which does not appear in the text of the document — a situation which is very likely in properly indexed documents.

Unfortunately, whatever rules or combination of such rules are applied, they gener­ally tend to penalise, most of all, those documents which have been professionally indexed according to best practice. As a result, particularly since the summer of 1999, properly indexed sites have all too frequently found themselves wrongly ex­cluded from the major search engines.

Alongside the major automatic search engines are the major Internet indexes. These are built by human web surfers, who are given lists of web sites and are employed to examine and categorise the content they find at each site. The surfer or editor then decides whether or not the site should be included in their index.

But what criteria do these professional surfers use to determine whether or not a given site should be included or rejected? Who knows? They could be many and various. However, one must now at least suspect that one of these criteria will be whether or not the site is likely to provide a source of profit to the major Internet index concerned. One thing is obvious. A single human being, working according to prescribed criteria, cannot possibly second-guess what a world full of individual Internet users are and are not interested in, or what they should or should not be allowed to find.

This is, in effect, censorship of the Internet by the back door. It may not be under the control of a single authority. However, the fact that every participant is now essentially driven by the commercial prerogative means that this collective censor­ship is narrowly focused upon commercial self-interest and away from the free and open exchange of any and all knowledge and information.

Indeed, this site has been dropped from many search engines on which it consist­ently appeared for the first 18 months of its existence on the Internet. Furthermore, I have found that without trawling through thousands of entries in a search results listing, it is impossible to resolve most of the information content of this site using a major 'public' search engine. This is the reason for my writing this search engine applet and its associated off-line indexer.

Within this site, unlike in the Internet at large, a strict discipline is followed regard­ing the proper use of keyword meta tags. That is why, within this site, this search eng­ine applet can provide far more effective results than can a major 'public' search engine.

Technical Details

Functional Model

I originally wrote this search engine according to the client-server model. It had a server-side index searcher called index.class. This contained search and retrieval methods which were invoked by the enquiring client-side applet via RMI [Remote Method Invocation]. It searched for keywords and retrieved the relevant URLs from a highly-tuned dataset comprising 6 data files. The applet simply handled the user input and the presentation of the results.

Unfortunately, there was a big problem with this. It required Java executables run­ning server-side. This is all very well for large corporations. However, there is no way a lowly unemployed programmer like me would be allowed to run an execut­able on his ISP's mighty server. Certainly not on any web site service tariff that can be afforded by anybody existing on this miserly pittance called welfare. I therefore had no choice but to take the whole thing client-side, leaving nothing but passive files sitting on the server. As far as I can see, Java makes no provision for the ran­dom access of files across the Internet. In other words, you cannot write things like:

RandomAccessConnection rac = url.openRandomAccessConnection();

Nor should such provision be made. With RMI there is no point. And it is most un­likely that the economic and commercial restrictions imposed upon the likes of me would even occur to the inhabitants of Sun Microsystems.

So take the whole thing client-side is what I did. Index and all. This restricted the size of the index. Nevertheless, it is possible to contain the index of a fair sized web site client-side without any trouble. I'm not going to explain how the applet works at the moment, but here's the source code so you can work it out for yourself.

  1. the main applet class
  2. the image loader class
  3. the title and summary display panel class
  4. the message display panel class
  5. the summary number display panel class
  6. displays the world map background image

Building the Search Engine's Index

This process is done by me off-line whenever the web site has undergone substan­tial modification. The process is performed by a program called spider.java.

Applet Not Working?

Sadly, since early 2019, Java programs will not run at all via the Web. See below to download and run the program directly on your computer. You need Java installed on your computer.

  1. Download the file index.jar and move it to your home directory.
  2. In your home folder, right-click on 'index.jar' and select 'Extract here'.
    A new folder called 'index' should appear in your home folder.
  3. Type: Ctrl-Alt-T to open a terminal.
  4. Go into your new directory [folder] 'index':
    cd index
  5. Enter the command:
    java index -j http://robmorton.20m.com/

The search applet should now open on your desktop.

Note: mainstream warnings notwithstanding, this program will neither blow up your computer nor wreak any other kind of fanciful mischief. It simply writes on your screen. In the good old days it simply ran embedded within the web page where its static image is now displayed. The embedded applet still runs in pre-2017 versions of browsers with Java 1.6 installed and the Web Start version still runs in pre-2019 versions of browsers with Java 1.8 installed. To read the rancid history of this sad retrogression in Web functionality, please click here. [Back to top.]

There is nothing wrong with this search engine applet. However, here are a few reasons why it may not be working in your browser and what you might be able to do to correct the problem. Make sure your browser allows pop-ups, plus the exe­cution of Java and JavaScript.

Application Blocked?

If you are using Microsoft Windows and a Security Warning box pops up saying that the application has been blocked from running because it is "untrusted" please click here. If you get similar messages with Linux, please click here.

Can't Enter Keywords

You're probably using the Opera Browser. Click in the middle of the applet. Then click in the narrow space between the applet and the links just below it. Then click in the text field. You should be able to enter keywords now. This is an inexplicable ano­moly of the Opera Browser. Also with Opera it is necessary to click the first but­ton you click twice in order for it to work.

Applet Doesn't Start

Browsers are updated from time to time. Sometimes an update can carry a serious bug that was not there before. An example is the Firefox browser cira April 2012. When you press the View Document button Firefox takes about 3 to 5 minutes to get around to loading the document. The remedy is to use another browser, the least problematic and most stable of which I have found to be SeaMonkey.

Applet Is Blank

All you get is a blank grey rectangle marking the area of the applet's window, but nothing else appears?

Check the Java Console of your browser to see what kind of Java "Exception" has oc­curred. Perhaps you simply need to update your version of the Java Runtime Envir­onment (JRE) to at least the version with which I last compiled the applet.

The "security" functions in some of the latest browsers can be problematic. On some settings, the browser does not permit applets to run at all. Other settings per­mit only so-called "certified" applets to run. That is, applets whose authors have registered the applet with some "authority" or other and acquired a digital certifi­cate for the applet. I can't afford the cost of registration. Some browsers give you the option of allowing applets from a particular web site to run in your browser. Try to find how to configure your browser to allow applets from my website to run with­in your browser.

Applet Displays A Red Error Message

A text field and various buttons appear on the applet but you also see a very tech­nical-looking red error message near the bottom of the applet area.

This probably means that your browser is allowing the applet to run but is denying it permission to download its own index from my server. Again, it is a problem with security settings. Somewhere in your browser's configuration (or settings) menu there should be an option for allowing an applet to download data either generally, or from specific named web sites.

This overly-tight security is only necessary with commercial web sites because some of them try to download programs to run native within your computer. My web site is completely non-commercial. Consequently, I have no motive to want to load pro­grams into your computer. Besides, I don't think this is possible with Java. As I am led to believe, this is only possible with something called ActiveX, about which I know nothing.

©22 October 1999, modified 10 July 2009, 10 April 2012 Robert John Morton