Google

INTERNET MASTER

Sunday, December 17, 2006

Basic Search Tips

QUICK TIPS

NOTE: These tips will work with most search engines in their basic search option.

  • Use the plus (+) and minus (-) signs in front of words to force their inclusion and/or exclusion in searches.
    EXAMPLE: +meat -potatoes
    (NO space between the sign and the keyword)
  • Use double quotation marks (" ") around phrases to ensure they are searched exactly as is, with the words side by side in the same order.
    EXAMPLE: "bye bye miss american pie"
    (Do NOT put quotation marks around a single word.)
  • Put your most important keywords first in the string.
    EXAMPLE: dog breed family pet choose
  • Type keywords and phrases in lower case to find both lower and upper case versions. Typing capital letters will usually return only an exact match.
    EXAMPLE: president retrieves both president and President
  • Use truncation (or stemming) and wildcards (e.g., *) to look for variations in spelling and word form.
    EXAMPLE: librar* returns library, libraries, librarian, etc.
    EXAMPLE: colo*r returns color (American spelling) and colour (British spelling)
  • Combine phrases with keywords, using the double quotes and the plus (+) and/or minus (-) signs.
    EXAMPLE: +cowboys +"wild west" -football -dallas
    (In this case, if you use a keyword with a +sign, you must put the +sign in front of the phrase as well. When searching for a phrase alone, the +sign is not necessary.)
  • When searching within a document for the location of your keyword(s), use the "find" command on that page.
  • Know the default (basic) settings your search engine uses (OR or AND). This will have an effect on how you configure your search statement because, if you don't use any signs (+, -, " "), the engine will default to its own settings.
  • Know whether or not the search engine you are using maintains a stop word list (see "Stop Words" Lesson 6.) If it does, don't use known stop words in your search statement. Also, consider trying your search on another engine that does not recognize stop words.

Quick Tips for Boolean Searches

  • In Boolean searches, always enclose OR statements in parentheses.
    EXAMPLE: Yosemite (campgrounds OR reservations)
  • Always use CAPS when typing Boolean operators in your search statements. Most engines require that the operators (AND, OR, AND NOT/NOT) be capitalized. Other engines will accept either CAPS or lower case, so you're on safe ground if you stick to CAPS.
    EXAMPLE: "immune system" AND homeopathic (medicine OR remedy)
I believe I have given you the ins and outs of seraching the internet. What you want to search for, depends on you

Yours Truly,
Ferdinand Che.

Search Strategies

STARTING OUT

It's always a good idea to THINK about your search before you begin. Create a search strategy in your head by asking yourself this question:

What do I want to do?

  1. Browse?
  2. Locate a specific piece of information?
  3. Retrieve everything I can on the subject?

Your answer will determine how you conduct your search and what tools you will use.

  1. If you're browsing and trying to determine what's available in your subject area, start out by selecting a subject directory like Yahoo! Then, enter your search keyword(s) into one of the metasearch engines, such as Vivisimo, just to see what's out there.
  2. If you're looking for a specific piece of information, go to a major search engine such as Google, or to a specialized database such as Bureau of the Census (for statistics).
  3. If you want to retrieve everything you can on a subject, try the same search on several search engines. Also, don't forget to check resources off the Web, such as books, newspapers, journals and other print reference sources.

DEFAULTS, AND OTHER STUFF

In your search statement, if you enter more than one keyword without using any accompanying sign, mark or symbol (see Lesson 7 and Lesson 8 for explanations and examples), the search engine will automatically add either the AND or the OR conjunction to link your search terms together. This could radically alter your search in unexpected ways. Be sure you know the defaults (basic settings) of the search engine you are using, as this could explain why your search results may not be what you expected them to be.

Strange things can happen for other reasons as well. Sometimes the relevance ranking systems that search engines use (and which they are reluctant to reveal), can throw off your search by ignoring some of the words in your search statement. This might happen when the search engine recognizes your string of separate keywords as a phrase in its list of pre-determined phrases or when it is responding to its own internal list of "stop words" (see below). Whatever the case, you may never know the real reason why your search retrieves so many irrelevant responses.

STOP WORDS

Stop words are words that many search engines DON'T stop for when searching texts and titles on the web. In fact, in order to cut down on response time, these engines routinely ignore stop words, i.e., small and common words, such as parts of speech (adverbs, conjunctions, prepositions, or forms of "to be"). Examples include: a, about, an, and, are, as, at, be, by, from, how, i, in, is, it, of, on, or, that, the, this, to, we, what, when, where, which, with, etc. Not all search engines recognize the same stop words. In addition, their lists can and do change frequently. If you initiate a search at a site that maintains a list of stop words and you type any of those words into your search statement (even in phrases surrounded by quotes), they may well continue to be ignored. An exception to this is Google, which has a stop word list but recognizes stop words within phrases surrounded by quotation marks, e.g., "to be or not to be" or "what you see is what you get".

CREATING A SEARCH STATEMENT

When structuring your query, keep the following tips in mind:
[NOTE: See Lesson 7 for an explanation of the signs and marks used below.]

  • Be specific
    EXAMPLE: Hurricane Hugo
  • Whenever possible, use nouns and objects as keywords
    EXAMPLE: fiesta dinnerware plates cups saucers
  • Put most important terms first in your keyword list; to ensure that they will be searched, put a +sign in front of each one
    EXAMPLE: +hybrid +electric +gas +vehicles
  • Use at least three keywords in your query
    EXAMPLE: interaction vitamins drugs
  • Combine keywords, whenever possible, into phrases
    EXAMPLE: "search engine tutorial"
  • Avoid common words, e.g., water, unless they're part of a phrase
    EXAMPLE: "bottled water"
  • Think about words you'd expect to find in the body of the page, and use them as keywords
    EXAMPLE: anorexia bulimia eating disorder
  • Write down your search statement and revise it before you type it into a search engine query box
    EXAMPLE: +"south carolina" +"financial aid" +applications +grants

ASSIGNMENT:

Assume you are about to start looking for work and need to write a cover letter. What search string would you use? Go to Google and select a few of the following strings to search:

  1. "cover letter" "job search"
  2. "cover letter" +resume
  3. "cover letter" +template +form
  4. "cover letter" +example
  5. "cover letter" +sample "helpful tips"

Scan the results page for each search you conduct and see if you can tell which searches seem to be the most productive and why.

Yours Truly,
Ferdinand Che

Evaluating Websites

CHECKING THE SOURCE

You can expect to find everything on the web: silly sites, hoaxes, frivolous and serious personal pages, commercials, reviews, articles, full-text documents, academic courses, scholarly papers, reference sources, and scientific reports. How do you sort it all out?

READING WEB ADDRESSES

First, you need to know how to read a web address, or URL (Universal Resource Locator). Let's look at the URL for this tutorial:

http://www.sc.edu/beaufort/library/pages/bones/bones.shtml

Here's what it all means:

  • "http" means hypertext transfer protocol and refers to the format used to transfer and deal with information
  • "www" stands for World Wide Web and is the general name for the host server that supports text, graphics, sound files, etc. (It is not an essential part of the address, and some sites choose not to use it)
  • "sc" is the second-level domain name and usually designates the server's location, in this case, the University of South Carolina
  • "edu" is the top-level domain name (see below)
  • "beaufort" is the directory name
  • "library" is the sub-directory name
  • "pages" and "bones" are folder and sub-folder names
  • the second "bones" is the file name
  • "shtml" is the file type extension and, in this case, stands for "scripted hypertext mark-up language" (that's the language the computer reads). The addition of the "s" indicates that the server will scan the page for commands that require additional insertion before the page is sent to the user.

Only a few top-level domains are currently recognized, but this is changing. Here is a list of the domains that have been in operation for the past several years and are generally accepted by all:

  • .edu -- educational site (usually a university or college)
  • .com -- commercial business site
  • .gov -- U.S. governmental/non-military site
  • .mil -- U.S. military sites and agencies
  • .net -- networks, internet service providers, organizations
  • .org -- U.S. non-profit organizations and others

In mid November 2000, the Internet Corporation for Assigned Names and Numbers (ICANN) voted to accept an additional seven new suffixes, which are already in operation or preparing to come into operation:

  • .aero -- restricted use by air transportation industry
  • .biz -- general use by businesses
  • .coop -- restricted use by cooperatives
  • .info -- general use by both commercial and non-commercial sites
  • .museum -- restricted use by museums
  • .name -- general use by individuals
  • .pro -- restricted use by certified professionals and professional entities
NOTE: Because the Internet was created in this country, "US" was not originally assigned to U.S. domain names; however, it is used to designate state and local government hosts, including many public schools. Other countries have their own two letter codes as the final part of their domain names, e.g., .uk for United Kingdom; .ca for Canada; .fr for France, etc.

For a list of Internet Country Codes, go to: ISO's list of Country Codes

DETERMINING PAGE AUTHORSHIP

You can tell a lot about the authenticity of a page by finding out all you can about its author/publisher.

Ask yourself this: Who is responsible for the page you are accessing? Is it a governmental agency or other official source? A university? A business, corporation or other commercial interest? An individual? As a rule of thumb, you can generally rely on the GOV and EDU hostnames to present accurate information. The NET, ORG, MIL, and COM domains are more likely to host pages with their own personal or organizational agendas and might require additional verification.

CHECKING THE VITAL INFORMATION

A reputable Web page will usually provide you with the following information:

  • Last date page updated
  • Mail-to link for questions, comments
  • Name, address, telephone number, and email address of page owner

Now ask yourself this: If the page owner is not readily recognizable, does he provide you with credentials or some information on his sources or authority?

CHECKING THE CONTENT

On the Web, each individual can be his/her own publisher, and many are. Don't accept everything you read just because it's printed on a web page. Unlike scholarly books and journal articles, web sites are seldom reviewed or refereed. It's up to you to check for bias and to determine objectivity. Who sponsors the page? The Flat Earth Society? Hmmm ...... Who is linking to the page, and what links to other pages does the page itself maintain?

Look to see if the page owner tells you when the page was last updated. Is the information current? Can it be verified at other, similar sites?

Try to distinguish between promotion, advertising, and serious content. This is getting to be more difficult, as an increasing number of pages must look to commercial support for their continuance.

Watch out for deliberate frauds and hoaxes. Some folks really enjoy playing games on the Web. Take a look at these two Web pages:

The White House
http://www.whitehouse.gov
The White House
http://www.whitehouse.org

ASSESSING WEB PAGE STABILITY

There is no way to freeze a web page in time. Unlike the print world with its publication dates, editions, ISBN numbers, etc., web pages are fluid. There's no bibliographic control on the Web. The page you cite today may be altered or revised tomorrow, or it might disappear completely. The page owner might or might not acknowledge the changes and, if he relocates the page, might or might not leave a forwarding address.

Try to assess the stability of the pages you reference. Again, one of the best ways to do this is to look closely at the page sponsor, last date updated, and the authority of the author(s).

When you are writing a paper and using web pages as source material, keep a backup of what you find on the Web, (either as a printout or saved to disk) so that you can verify your sources later on if need be.

Yours Truly,
Ferdinand Che

Subject Directories

WHAT ARE SUBJECT DIRECTORIES?

Subject directories, unlike search engines, are created and maintained by human editors, not electronic spiders or robots. The editors review and select sites for inclusion in their directories on the basis of previously determined selection criteria. The resources they list are usually annotated. Directories tend to be smaller than search engine databases, typically indexing only the home page or top level pages of a site. They may include a search engine for searching their own directory (or the web, if a directory search yields unsatisfactory or no results.)

HOW DO SUBJECT DIRECTORIES WORK?

When you initiate a keyword search of a directory's contents, the directory attempts to match your keywords and phrases with those in its written descriptions. Subject directories come in assorted flavors. There are general directories, academic directories, commercial directories, portals and now, vortals. Portals are directories that have been created or taken over by commercial interests and then reconfigured to act as gateways to the web. These portal sites not only link to popular subject categories, they also offer additional services such as email, current news, stock quotes, travel information and maps. Vortals, or vertical portals, (See Lesson 4 for examples) are subject-specific directories, as opposed to the broader, more generalized smorgasbord of subjects and other links commonly found in portals.

NOTE: Today, the line between subject directories and search engines is blurring. Most subject directories have partnered with search engines to query their databases and search the web for additional sources, while search engines are acquiring subject directories or creating their own.

Two subject directories have partnered with and developed their own search engines that are very powerful. You will see them listed in both the search engine and the subject directory categories. Check out the different engine and directory "looks" below:

WHAT ARE THE PROS AND CONS OF SUBJECT DIRECTORIES?

PROS:
Directory editors typically organize directories hierarchically into browsable subject categories and sub-categories. When you're clicking through several subject layers to get to an actual Web page, this kind of organization may appear cumbersome, but it is also the directory's strength. Because of the human oversight maintained in subject directories, they have the capability of delivering a higher quality of content.

They may also provide fewer results out of context than search engines.

CONS:
Unlike search engines, most directories do not compile databases of their own. Instead of storing pages, they point to them. This situation sometimes creates problems because, once accepted for inclusion in a directory, the Web page could change content and the editors might not realize it. The directory might continue to point to a page that has been moved or that no longer exists.

Dead links are a real problem for subject directories, as is a perceived bias toward e-commerce sites.

WHEN DO YOU USE SUBJECT DIRECTORIES?

Like the yellow pages of a telephone book, subject directories are best for browsing and for searches of a more general nature. They are good sources for information on popular topics, organizations, commercial sites and products. When you'd like to see what kind of information is available on the Web in a particular field or area of interest, go to a directory and browse through the subject categories.


EXAMPLES OF SUBJECT DIRECTORIES AND PORTALS :

Subject Directories

Portals (subject directories serving as home pages)

I bet you should be driving towards a masters in internet search by now, why not?

Yours Truly,
Ferdinand Che.

Metasearch Engines

WHAT ARE METASEARCH ENGINES?

Metasearch engines do not crawl the web compiling their own searchable databases. Instead, they search the databases of multiple sets of individual search engines simultaneously, from a single site and using the same interface. Metasearchers provide a quick way of finding out which engines are retrieving the best results for you in your search.

HOW DO METASEARCHERS DISPLAY THEIR RESULTS?

Metasearch engines present the results of their searches in one of two ways:

  1. Single List. Most metasearchers display multiple-engine search results in a single merged list, from which duplicate entries have been removed.
  2. Multiple Lists. Some metasearchers do not collate multiple-engine search results but display them instead in separate lists as they are received from each engine. Duplicate entries may appear.

WHAT ARE THE PROS AND CONS OF METASEARCHERS?

PROS:
Metasearch engines can give you a fair picture of what's available across the Web and where it can be found.

Metasearchers are very fast.

CONS:
More and more, metasearch engines seem to be casting smaller nets by relying on subject directories and pay-per-click engines for their Web results.

Metasearch engines don't offer the "salad bar" of search options that individual search engines do. When you initiate a keyword or phrase search on a metasearch engine, you are usually at its mercy as far as how the search is configured and conducted.

Although metasearch engines query a number of individual search engines, not enough query Google, one of the largest and most popular search engines on the Web. (Note: Dogpile and Mamma both search Google)

WHEN DO YOU USE METASEARCH ENGINES?

Use metasearchers when you are in a hurry. Metasearch engines are useful in obtaining a quick overview on a subject and/or unique term.

Use metasearchers when you are conducting a relatively simple search and also when you are not having any luck pulling up documents in your search.

EXAMPLES OF METASEARCH ENGINES:

Do you know what subject directories are? Visit this blog agsin.

Yours Truly,
Ferdinand Che

About Search Engines

WHAT ARE SEARCH ENGINES?

Search engines are huge databases of web page files that have been assembled automatically by machine.

There are two types of search engines:

1. Individual. Individual search engines compile their own searchable databases on the web.
2. Meta. Metasearchers do not compile databases. Instead, they search the databases of multiple sets of individual engines simultaneously (see Lesson 2).

HOW DO SEARCH ENGINES WORK?

Search engines compile their databases by employing "spiders" or "robots" ("bots") to crawl through web space from link to link, identifying and perusing pages. Sites with no links to other pages may be missed by spiders altogether. Once the spiders get to a web site, they typically index most of the words on the publicly available pages at the site. Web page owners may submit their URLs to search engines for "crawling" and eventual inclusion in their databases.

Whenever you search the web using a search engine, you're asking the engine to scan its index of sites and match your keywords and phrases with those in the texts of documents within the engine's database.

It is important to remember that when you are using a search engine, you are NOT searching the entire web as it exists at this moment. You are actually searching a portion of the web, captured in a fixed index created at an earlier date.

How much earlier? It's hard to say. Spiders regularly return to the web pages they index to look for changes. When changes occur, the index is updated to reflect the new information. However, the process of updating can take a while, depending upon how often the spiders make their rounds and then, how promptly the information they gather is added to the index. Until a page has been both "spidered" AND "indexed," you won't be able to access the new information.
NOTE: While most search engine indexes are not "up to the minute" current, they have partnered with specialized news databases that are. For late breaking news, look for a "news" tab somewhere on the search engine or directory page. Examples include:

* Google Breaking News
* Yahoo! News

WHAT ARE THE PROS AND CONS OF SEARCH ENGINES?

PROS:
Search engines provide access to a fairly large portion of the publicly available pages on the Web, which itself is growing exponentially (see "How Big Is the Internet?")

Search engines are the best means devised yet for searching the web. Stranded in the middle of this global electronic library of information without either a card catalog or any recognizable structure, how else are you going to find what you're looking for?

CONS:
On the down side, the sheer number of words indexed by search engines increases the likelihood that they will return hundreds of thousands of responses to simple search requests. Remember, they will return lengthy documents in which your keyword appears only once.

Additionally, many of these responses will be irrelevant to your search.

ARE SEARCH ENGINES ALL THE SAME?

Search engines use selected software programs to search their indexes for matching keywords and phrases, presenting their findings to you in some kind of relevance ranking. Although software programs may be similar, no two search engines are exactly the same in terms of size, speed and content; no two search engines use exactly the same ranking schemes, and not every search engine offers you exactly the same search options. Therefore, your search is going to be different on every engine you use. The difference may not be a lot, but it could be significant. Recent estimates put search engine overlap at approximately 60 percent and unique content at around 40 percent.

HOW DO SEARCH ENGINES RANK WEB PAGES?

In ranking web pages, search engines follow a certain set of rules. These may vary from one engine to another. Their goal, of course, is to return the most relevant pages at the top of their lists. To do this, they look for the location and frequency of keywords and phrases in the web page document and, sometimes, in the HTML META tags. They check out the title field and scan the headers and text near the top of the document. Some of them assess popularity by the number of links that are pointing to sites; the more links, the greater the popularity, i.e., value of the page.

WHEN DO YOU USE SEARCH ENGINES?

Search engines are best at finding unique keywords, phrases, quotes, and information buried in the full-text of web pages. Because they index word by word, search engines are also useful in retrieving tons of documents. If you want a wide range of responses to specific queries, use a search engine.
NOTE: Today, the line between search engines and subject directories (see Lesson 3) is blurring. Search engines no longer limit themselves to a search mechanism alone. Across the Web, they are partnering with subject directories, or creating their own directories, and returning results gathered from a variety of other guides and services as well.

EXAMPLES OF INDIVIDUAL SEARCH ENGINES:

* Google
* Ask

EXAMPLES OF SEARCH ENGINES THAT HAVE PARTNERED WITH SUBJECT DIRECTORIES:

* Gigablast
* Yahoo! Search

On the next post, i will be talking to you about Metasearch Engines

Yours Truly,
Ferdinand Che

Great Search Engines

Need information? There are hundreds of search engines out there on the web. . .

Five major "search engines" do stand out, however, for their massive catalogs of information: Google.com and its Amazon.com variant, A9.com, AllTheWeb.com, AskJeeves.com, Vivisimo.com and Dogpile.com.

These five database engines use "spiders" (automated programs) to read thousands of pages per day, and index them for easy finding later.

Three major "search directories" also stand out for their voluminous catalogs: Yahoo's Directory, DMOZ.org, and About.com. Different from search engines, these three search directories use human editors and reader submissions to hand-pick their cataloged content. With human reading being much slower than robot spiders, you can expect search directories to be much smaller than search engines. The human editor element, however, does add the filter of human judgment, which can help cut down the drivel you have to sift through when searching.

So, when it comes to the question, "which search tool is the best?", the real question should perhaps be: "which search tool do you personally prefer?"

Google.com has the least advertising on its screen, and the most indexed content of all the search engines.

DMOZ is slower to load, but it has excellent depth of content.

Vivisimo uses "clustering" to present results in categorized format. About.com has lots of advertising, but has amazing subject matter expertise.

Ask Jeeves, Dogpile, and AllTheWeb have their pros and cons, too.

There are almost 300 other search tool choices not even listed here. Whichever you personally prefer, every one of these search tools contains more content than you or I could ever read in a lifetime! The smart choice, accordingly, would be to test and compare these major search tools for yourself.

Don't settle for one search tool! Use different search engines and directories in combination! Not only do search tools change their appearance every few months, you are also more likely to locate higher-quality web pages when you combine the high volume of spidered content, and the hand-picked reviewing of human editors. Rotate your search tools, avoid the rut of relying on only one search engine, show some perseverance and patience, and you will get good results.

Yours Truly,
Ferdinand Che