Outline:
As a beginning note, there is a lot of jargon and acronyms in any discussion of the web. Don't be overwhelmed- everyone else had to learn these, and so can you.
--> laboratory assignment.
What is the Internet and the World Wide Web?
The Internet is a "network of networks". It is a wide variety of computers around the globe that all speak the same "language" (TCP/IP) to talk to each other. The Internet began in the late 1960's as an Advanced Projects Research Agency (ARPA) project. ARPA was a branch of the US military that explored a wide variety of experimental technologies. In the late 60's, the military became interested in how computers can efficiently trade information. In those days, computers were expensive, fragile, and large. The military had a wide variety of projects that needed computers, but moving everyone working on such a project to where the computer was wasn't always possible. It would be much easier for everyone to work where they normally did, and just network to the computer as necessary using cheaper, smaller computers or terminals. As an added bonus, the military also was interested in designing a network that could survive when some of its computers were down. (Or destroyed in a war.)
The ARPAnet was the fruit of this labor. It connected a few dozen universities and private research agencies at the start, and experienced very little growth for many years, in part due to the huge cost of a network link. One interesting thing about the ARPAnet which is still true of today's Internet is that nobody is in charge. Each computer is run by an operator, and typically each machine is configured to accept messages from any other computer on the network. There is no overriding "main machine", in large part because such a machine would be vulnerable to sabotage or destruction. This unique feature is responsible for the explosive growth of the Internet over the past few years: anyone can hook up a machine to the Net with very little fuss. (When you connect a computer to the dorm's Ethernet lines or connect through SLIP or PPP with a modem your computer is part of the Internet.)
The first two uses of the ARPAnet are still important today:
Very shortly after the ARPAnet became operational, Electronic Mail was added to the above two programs. Email sends a message from a user to another, possibly on a different machine many thousands of miles away. One of the first large scale uses of Email was the SF-LOVERS mailing list- anyone who sent a message to that email address had it forwarded to everyone else on the list. (This caused more than a bit of trouble for the owners of the list when it was discovered what was being done with military funded networks.)
Through the 70's and early '80s the ARPAnet grew slowly. ARPA eventually gave up control of the main links to the civilian National Science Foundation. Other networks such as BITNET, DECNET, and FIDONET also grew during this time, each speaking its own language. Eventually, the networks merged with the now NSFnet, through the use of gateway computers that could speak both TCP/IP and the language of the other network to form the much larger Internet.
The explosive growth in the Internet over the past few years has been fueled by the popularity of the most recent development in the Internet: the World Wide Web. The WWW was developed at CERN, the European high energy physics research lab as a way to distribute information and data to a large number of scientists, all on different computers. The WWW uses what is known as a "Client-Server" model. The Client program, also called a web browser, sent a request to a web server for a document and the server sent the information back. Thus it was easy to keep a central storehouse of information (web pages) that all the scientists could read from their own computers. The WWW was originally designed to show "hypertext", where the user could click on a word or phrase and switch to another document. Very useful for help files and such, although there was one thing still missing. Up until the late 80's, Internet traffic had been almost totally text based: you could transmit pictures and reassemble them at the other end, but there was no easy way to show both pictures and text like you would see in a book.
The development of Mosaic in the early 1990's changed all that. The first graphical web browser, it let people put both hypertext and pictures on the same page, although that's about all it did. (As an example, you couldnt put a picture next to text, you had to put in a space all it's own.) Still, the ease of use of Mosaic and it's kin led to a very rapid growth in the use of the Internet. Development of web browsers continues at a torrid pace today, with new versions coming out every six months, each able to handle more types of data. There are a wide variety of acronyms and buzzwords that might be helpful to know about the web, so I'll list some of them below:
Using Netscape as a Web browser
The two most common web browsers in use today are Netscape's Navigator (often just called Netscape) and Microsoft's Internet Explorer. Both are available for Windows machines and Macintoshes; Netscape is also available for most forms of Unix and OS/2. For the most part, they work in very similar ways, although some of the names are different, and are roughly equivalent in features. For the purposes of this document, I'll use Netscape as the example.
Starting a web browser is easy- it's just like any other program. Double click, and the browser will load. Depending on how the browser is set up, it may show no page, or it may load the default page. Once it starts you will see a page as well as the toolbar, shown below for Netscape Navigator 3.0 running on the Macintosh. The Windows version has some differences in layout, but the features are the same. MS Internet Explorer looks even more different, but works almost exactly the same.

The top part of the title shows the current page's title. Here, we're looking at the Chemistry department home page, so the title reflects that.
Below that is a series of menus, then a row of buttons, and finally a bar that shows the address. The address is a URL, a Uniform Resource Locator. Currently, the address is http://www.usc.edu/dept/chemistry. This means that the page is a web page, on machine www.usc.edu. The rest of the address explains where the page is- the server will look in a directory called dept/chemistry to find the file displayed. (By default, if no page is listed in the address the server sends the page named index.html. Thus, this address is really http://www.usc.edu/dept/chemistry/index.html ) You can type an address into this box directly- typing http://www-scf.usc.edu/~chem203 and hitting return takes us directly to the Chem 203 page. Note the little animation on the right: when it is running, the page is downloading. This may take a while for some of the larger pages. You can also use the Open Location menu item from under the File menu to enter an address.
If you want something quicker than typing the name every time you want to go to a page, you can create a "bookmark". (MSIE calls them "Favorites" instead, but its exactly the same idea.) Open the bookmarks menu and choose "Add Bookmark". Once you've done this, you can go and open the Bookmarks menu again and you will see an entry for the site you just added. Clicking on it will take you to that page. By choosing "Edit Bookmarks" you can move them around, sort them however you wish and place them in subfolders, which will show up as submenus in the Bookmarks menu.
Once you have visited a page, you may want to follow a link. By default, links are shown as underlined text. Simply clicking on one will take you to the page that the link points to. The link can do other things as well: some may start up a new web browser window, others may send email to someone. Links can also be embedded into pictures. The row of "buttons" down the side of both the USC Chemistry and Chem 203 pages are also links. You can tell this by watching the pointer: it changes from an arrow to a hand when it moves over a link. (There are some images that do not do this- the only way to find out that they are images is to click on them.)
Once you have visited the link, you may want to go back to the original page. You can do this with the "Back" button in the button bar. Click on it once and you will move to the last document you read. The browser remembers most everything that you do, so clicking multiple times will take you back through the documents you have read. The "Forward" button works the same way. If you want more control, the "History" entry under the "Communicator" menu will show all of the sites you have visited recently.
The "Home" button takes you whatever page you have listed as a home page in your preferences. "Search" will take you to a search engine, "Guide" gives you a list of things that Netscape thinks are interesting, "Print" should be obvious, "Security" will tell you about the page's security rating, useful if you are sending things like credit cards over the net, and "Stop" stops loading the page if you are bored with waiting.
The "Reload" button is a bit confusing. In general, the Internet is fairly slow, so web browsers try to speed things up by "caching" images and text that you have seen recently. For example, if you are visiting a large site, chances are that some of the image buttons on each page are the same. The browser can cache these on your hard drive, and load them from there rather than loading them from the Net over and over. However, it's also possible that the page has changed since you last read it- the browser looks in it's cache first, finds the page and displays it, oblivious to the fact that the cached page is not the same as the new one. Hitting "Reload" will force the browser to go back to the site and get the latest version.
Plug-ins and installation of Chime
Once the WWW began to become popular, a common problem arose. Web browsers typically understood a few data types: HTML, GIF and JPEG pictures, perhaps MPEG movies. Yet a web server could send any data type it wanted. People quickly decided that they wanted a way to access that data through the web.
One easy way was through "Helper applications". These are applications that exist on the computer running the web browser. When a web server gets a request for a document, it attaches a tag to the front of it saying what kind of data the file is. For example, the line for HTML text is "Content-type: text/html". The "text/html" is what is known as a MIME type. Anytime a company decides to make a new type of data available over the web, it creates a new MIME type and some reader to understand it. Apple added "video/quicktime" for it's Quicktime movies, and the PDB format that most of the molecules we will use for this course is "chemical/x-pdb".
When a browser gets a document, it looks through a list of MIME types to find out what to do with the file. In the case of HTML, it knows what to do. In the case of something like PDB, it sees if there is an entry in the MIME type database. You can configure this A program known as RASMOL understands these kind of documents- if you wanted to use RASMOL to view a PDB file, you'd go into the preferences file, add the chemical/x-pdb MIME type and tell the browser that you wanted RASMOL to open these files. When you came across such a file, the web browser would send the file to RASMOL which would then display it.
The problem with this approach is that you quickly have a bunch of applications, all showing parts of the document, but no way to show the whole document at once. Netscape's solution to this in their 2.0 browser was to include the capability for "plug-ins". These are programs that can read a piece of data that the browser can't understand in much the same way that a helper application does. However, rather than running as a separate program, they display their output in the web browser window along with the text and images that the web browser can normally understand. Thus, you can read documents containing many different data types all on the same page. You can see what plug-ins are installed and what data types they can read by choosing "About Plug-Ins" from the Help menu under Netscape for Windows or under the Apple menu for Macs.
Two plug-ins that we will use very heavily in this class are Chemscape Chime from MDLI and Adobe Acrobat Reader. The former understands a wide variety of molecule types and makes it possible to display 3-d models of the molecule on a page. The latter understands Portable Document Format documents. PDF is a document format that allows the writer to include information not possible with HTML, such as equations, drawings, and specific fonts and layouts. The chemistry department uses it for scanned documents like homework and old test solutions.
Both of these plug-ins are already installed on the machines in the campus computer rooms. If you use a web browser in Leavey, KOH or elsewhere, you won't have to worry about these issues. It's not hard to configure your home machine to do the same though.
First: get a copy of the plug-in. Chemscape Chime is available from MDLI at http://www.mdli.com/ and Adobe Acrobat Reader is available from Adobe at http://www.adobe.com/ . You can download a copy just by clicking on the correct links. Chime exists for all versions of Windows, MacOS (8x., 9x but not OS X) for both Netscape and Internet Explorer but works best with Nescape; Acrobat exists for just about every OS under the sun.
Second, quit your browser. Both the Mac and Windows versions will unpack to an installer program. Double click on the installer to start it. You'll be asked to find the "plug-ins" directory of your web browser: it's located in the same directory as the program itself. Enter the directory and click on "save", "extract", "install" or whatever else the dialog allows you to do.
Third: restart the browser. That's all you should need to do!
Search engines and strategies
By far the biggest problem on the web when it started was finding information. It was like a library that not only had no card catalog, but the books weren't even in the same building. Unless you knew the exact address of the page you wanted, you couldn't find it. HTML made it easy to put links from one page to another, and fairly quickly people started collecting lists of URLs and placing them on pages. This rapidly grew from a hobby to a business for several people, such as the folks behind Yahoo. Yahoo started as a collection of favorite links of two Stanford grad students, who took the time to find links and add them into an organized list. The site rapidly grew until it was consuming over 40% of the total network traffic at Stanford. The two students were asked to take Yahoo elsewhere and they turned it into a multi-million dollar company.
A more systematic approach was taken by several other companies who began running "web crawlers", programs that found and followed every link they could while saving the results. These programs were mated to programs that could search for keywords in a document, and the Web search engine was born. All of the indices like Yahoo also include search engines, and some of the search engines also include an index. One advantage of the index is that documents in the index are probably relevant to the subject you want to look for, whereas documents found through a search engine may not be. (Ask a search engine about drugs and you'll get a lot of hits on crime stories, for example.)
The problem with the Web now is the opposite of when it was born. A search for the word "AIDS" in one of the more complete databases (Altavista) returns 472320 documents as of 8 September. Many are junk, many more may be helpful, but narrowing down the list of documents to find is a very difficult task, and probably the single hardest part about using the web.
There are a large number of search engines and indices available: some of the better known include
As an example of a index, I'll use Yahoo and for a search engine, I'll use Altavista.
To find a subject in an index, you can either look under the subject topics or use the search engine that's attached to the index. For example, if we want to find AIDS in the Yahoo index, we could start by looking under the Health link. This gives a wide listing of topics related to health: we'll choose Disease and Conditions. (The number 3669 means that there are 3669 links under this heading.) We now have a link that lists AIDS/HIV, with 639 entries- following this gives us a wide variety of links to choose from.
As an alternative, we can just enter AIDS in the search engine window. Click on the little form entry window, type AIDS and hit search. We get a bunch of sections within Yahoo that contain the word AIDS. Note that not all of the sections are things we want- we also get the sections for hearing aids and golfing aids, for example. Still, the number of sections is manageable.
Using a search engine is often an exercise in patience and playing until you find the correct word or phrase. There are a number of ways to help your search by narrowing down the list of returned articles.
Let's say that we're interested in the controversy over the discovery of the AIDS virus, pitting the American Robert Gallo against the French Institute Pasteur. We start by going to Altavista.
Altavista claims on it's help page to understand natural language searches- some of the other search engines do as well. My experience is that this is mostly marketing hype: you'll do better if you understand some of the rules and just use them instead of trying to speak to a computer in English. As an example, I'll type "Gallo and the discovery of HIV" into the search box and hit return. I get ~89000 documents, of which the first is clearly bogus. Some of the rest might be ok, but we can do a lot better.
A few simple rules to follow for Altavista- many of the search engines have very similar rules, but you should probably check the help files:
Let's give it a try. We'll start basic, and gradually refine the search path.
Search for: Gallo HIV. Returns 23820 documents. Lets force the engine to include both words.
Search for: +Gallo +HIV. Returns 797 documents. Still too many. A lot of the documents seem to be of the form "HIV doesn't cause AIDS, Gallo is wrong." A lot seem to have the word "rethink" or "rethinking" in them. Let's get rid of them.
Search for: +Gallo +HIV -rethink*. Returns 675 documents. Still too many. We do know that the dispute is with someone in France.
Search for: +Gallo +HIV -rethink* + France. Returns 76 documents. Here we go. This stripped a lot of the chaff, although some pages are still a bit wrong. One last cut, since we want to know history, let's ask for it.
Search for: +Gallo +HIV -rethink* + France +history. Returns 31 documents, of which the first include Timeline: A Brief History of AIDS/HIV, The Politics of HIV and AIDS, and INSTITUTIONAL RESPONSE TO THE HIV BLOOD TEST PATENT DISPUTE AND RELATED MATTER. All three are clearly relevant to our search.
This is the sort of process you have to do on almost any web search. It's easy to get frustrated- try not to, and if you keep getting junk articles or no return at all, think about other ways to approach the search.