It has been five days since a massive earthquake off the coast of Sumatra spawned the tsunamis that killed over 114,00 people. Images from the effected region show absolute destruction in the wake of the worst natural disaster in living memory. Hospitals, businesses, schools and entire villages were swept away in the span of 20 minutes. The full extent of the horror and devastation will never be understood, even by those who survived it. Given the scope and locale of the disaster, it is likely that nearly everyone on Earth will be affected by the loss of a friend, colleague or loved one.
Communications from the region have been sporadic due to the destruction of much of the local infrastructure. Despite the wall to wall coverage provided by the mainstream media and the tireless relief efforts from the Red Cross and other NGOs, contact between survivors and their families is difficult. Immediately after the disaster struck, several Blogs began posting messages from survivors. As the week has unfolded, other Blogs have started filling information gaps left by the lack of civilian infrastructure.
One of the most comprehensive is a collaborative effort of 30 – 40 international bloggers at http://tsunamihelp.blogspot.com. Posted within hours of the first reports, the site offers information on how to donate to specific areas and how to find out information about loved ones.
Some of the most gripping accounts of the devastation can be found in the first person stories at ChiensSansFrontiers.
As each day wears on the death toll increases by tens of thousands. The inevitable spread of waterborne diseases is expected to push the number above 200,000. This is the worst natural disaster in living memory and one of the worst in human history. The International Red Cross is asking for cash donations, partially in the hopes of helping rebuild shattered local economies by purchasing relief supplies as close to the effected areas as possible. Due to the sheer numbers of people trying to connect, the servers for many Red Cross branches have been failing recently.
If you have not already donated money, blood or both, please contact your local branch of the Red Cross.
Recently, a friend of mine bought a new car. Buying a car can be extremely stressful with an enormous array of important numbers, specifications and comparative measurements to consider before purchase. Now I was raised in the downtown core of Toronto which is a megalopolis stretching around the northwestern quarter of Lake Ontario. Growing up with a highly efficient public transit system and a decidedly urban lifestyle, I never even considered the need for a vehicle until I was in my mid 20's. Ten years later I am still at the "this moves me where?" stage in my relationship with vehicles. Taking me to a car-dealership is sort of like asking another city-kid which mushrooms are safe to eat in the forest. "Hey that one looks cool..." Things can get pretty Mickey Mouse from here eh?
Being Mr. Urban-boy, I took a common sense approach to this problem. In order to appear less dense than I actually am, I did a bit of research on the types of cars my buddy was interested in. There is a lot of information out there. Being an SEO, I have a knack for easily finding information quickly. I found a lot of information and began to compile small dossiers on several vehicles that my friend mentioned. I can tell you about torque ratios and fuel efficiencies and anti-lock braking systems and all sorts of other stuff about several different models.
I was about to learn a very sad truth about such matters. When it comes to new cars, I don't really know what I am talking about. I know enough to make conversation with another person but when it really comes down to it, lots gets lost in transmission.
Armed with numbers and knowledge, I felt somewhat comfortable helping my friend avoid getting sharked by a salesperson, at least in my head. In my heart however, I knew I was descending into a realm I've never really needed to understand before. While I already understood the basics of internal combustion engines, a glance under the hood of a 2004 model showed a very different design than the V.W. or Slant6 engines I've seen over other friends' shoulders. After about five minutes, I decided my best contribution would be to simply stop asking questions and just watch the person selling the car. Let me tell you, people selling cars can throw numbers around and they sure know a lot about the vehicles they sell. Some of them where really nice folks. Others could have been typecast for their roles as totally scuzzy car dealers. The experience reminded me of a part of the cyber-world that is very close to my heart.
After my experience "helping" my friend find a new car, I thought about tools that provide businesses with information about their websites and online marketing efforts. There are a number of free SEO analytic tools out there for webmasters and site developers to work with. In many cases, these analytic tools offer a lot of numbers but very little actual analysis and function as sales devices for SEO or SEM firms.
Being able to access stats regarding the number of incoming links or the number of words found on a page does not necessarily give one the full knowledge needed to practice SEO. Search is a complicated field that has never provided a static environment. That complexity is the primary reason the SEO sector exists. When it comes to structuring an online marketing campaign, having hard facts about your website gives you the ability to make informed decisions, especially when you don't have the luxury of examining the eyes of the salesperson on the other end of the phone.
Still, knowing all the numbers doesn't really mean one knows the score. What do the numbers really mean in relation to each other or in relation to a competitor's site? Here is a basic guide to analytic data you should be looking at.
W3C Compliance:
The World Wide Web Consortium (W3C) is the body that sets technical standards on the web. Being certain your site is W3C compliant helps ensure it can be read by any search engine spider. Look for a tag at the very top of your source code that looks something like this:
Title:
There are not common rules for the length of a page title but conventional wisdom says the greatest "power area" is found in the first 40 characters. If your page titles do not have keywords found in the first 40 characters, chances are you will want to have them rephrased. You will also want to ensure that each page in a site has a unique, topic-specific title.
Meta Description and Meta Keyword Tags: There are two meta tags that are important to search engine rankings, the description and keyword tags. Of these two tags, the description is the most important but the keyword tag is thought to carry a very small weight on some search tools. Both tags should be kept below 190 characters and have the strongest keywords or phrases as close to the beginning of the tag as possible. It should be noted that alterations to either of these tags will effect another important analytic measure, keyword density.
Use of Heading Tags:
Headings should be used like page headlines. A good analytic tool will tell you how many H1, H2, or H3 tags are used on each page analyzed. Search engines tend to give a bit more weight to keywords phrased in heading tags however they also penalize sites that misuse headings tags. Knowing the number of times a heading tag is used doesn't tell you if that is the optimal way to use such tags. It is also difficult to offer general advice on the use of headings tags except to suggest that limiting the use of these tags is generally wise.
Use of IMG-ALT Text:
A good analytics tool will tell you how many images on a page use ALT text, however, most analytics tools will not tell you if ALT text is used wisely. Image ALT text is the text that appears when a mouse hovers over an image. It is primarily used as an accessibility tool allowing page readers to describe an image for visually impaired visitors. Image ALT text is also being used as a SPAM tool by some search engine marketers.
HTML Size (or Page Size):
Good analytic tools will tell you the size of your website. Generally, the smaller the number the better as small pages load faster and are more likely to address one topic per page. If your analytic tool tells you your page size is very large that is likely an indication you need to restructure your website.
Keyword Density:
Keyword density refers to the keyword/non-keyword ratio of the site. This is a touchy area as many in the SEO community do not believe keyword densities play a factor in organic placements but it is an often analyzed page element. Scott Van Achte, head SEO at StepForth Placement does consider keyword densities however he believes that the “optimal” keyword density is directly related to the other pages listed in the Top10 placements. This is an area you would want to address with a professional SEO firm.
Site Structure:
This refers to how a website is constructed and is a fairly dense area to work through. Please note: Given the numerous types of sites, databases and design tools, a general view on site structure is rather difficult to present, an issue shared by all analytic tools, tech-writers included. If you think your site structure might have an adverse effect on site rankings, you should speak to an SEO. Here is some general advice on analytic tools and site structure though.
The first thing a good online analytic tool will do is tell you if a search engine spider can read your website. A problem here is that (generally speaking), most search engine spiders are more advanced than the free SEO analytic software. Quite often an older tool will tell a webmaster that their site is not open to search engine spiders when the site is in fact wide open.
Next, a good tool will offer a representation of link paths found within the site. Users should be able to see each link listed, including the anchor text used to phrase the links. Any dead links should also be exposed by the tool. If there are critical SEO issues posed by the structure of the site, a really good free tool will offer corrective suggestions as well.
An important thing to note is that most analytic tools examine individual webpages not entire websites. You want to be certain you have a full analysis of your entire website before undertaking a major redesign or SEO effort.
Incoming Links:
One of the biggest factors shared by all major search engines is that spiders find pages by following links. Furthermore, the number of links directed to a specific page has an effect on the placement of that page.
A good analytic tool will tell you exactly how many incoming links are directed to the page being studied. A really good tool will give you an active list of these links however, most free tools do not generate such lists. Link Analysis is an important part of SEO work however this is another area in which most analytic tools simply can not offer a full picture of the effectiveness of current incoming links.
Overall, website analysis is very complex, made more difficult by the fact that analysis of competing websites is critical to establishing useful baselines. It is important to remember that most online analytic tools look at pages, not entire sites. Webmasters and marketers are urged to gather as much information as possible before considering search engine optimization, whether in-house or out-sourced to a professional firm. Above all, use the numbers gathered in analytic study of the various pages in your website to quiz whomever you are considering for your SEO effort. If every topic addressed above generates a thoughtful response (even if it challenges what I've written), chances are you are talking to a pro.
By the way, my friend has a nice new car. At the end of what turned out to be a very long day, my friend settled on a great car purchased from the salesman he trusted the most. After I stopped trying to be an expert on cars, I reverted to an expertise we all share, I used my meager knowledge to probe my sense of trust.
The past year saw immense growth in the search sector. Search is bigger today than it was twelve months ago in every respect. With the Internet becoming a larger part of people's lives and broadband access becoming the norm around the world, 2004 was the year that big business fully recognized the full impact of search.
The search sector drives web-traffic by providing each web user with the dynamic roadmaps and signposts that make the web usable. This fact has finally become staggeringly obvious to anyone with an interest in the web. That these roadmaps are self-generating and are increasingly influenced by the interests of the individual user makes search the most powerful medium in the world. The largest of the search firms have found a stable business model in paid contextually delivered advertising that promotes growth while providing unequaled opportunities for advertisers.
Sensing the enormous potentials, investors piled money into a sector that was super-heated by interest surrounding Google's IPO. Eighteen months of mega-money funding set the stage for the influx of innovative features and tools each of the major players introduced recently. The presence of so much money has also sparked grassroots innovation seeing an increasing number of formal start-ups and home-baked software design enthusiasts produce an array of search related tools and products. Today, nearly every digital product can be searched in one way or another. Among the greatest developments of the year was the expansion of search engine databases to include a variety of file formats previously inaccessible to search engine spiders.
Investment in the growth of the search industry coincides with vast improvements in US home Internet access options that until recently acted as a long-term construction-zone on the information super-highway.
Broadband access in the US has crossed the 50% mark. The introduction of affordable high-speed access for US consumers is one of the most important milestones in the development of the Internet. According to Nielsen/NetRatings, as of October, 53% of US home Internet users have broadband access. While the general behaviors of American Internet users have not yet changed, the increasing number of high-speed users allows the delivery of a wider array of information directly to home users. From interactive appliances to the replacement of traditional print media to altering social interaction, broadband access changes the way people do things. Now that the majority of American Internet users have high-speed home access, the Internet can start to meet much more of its actual potential.
For most of the western world, high-speed home access has been a reality for several years. Legal bickering amongst the American cable and telephone cabals had delayed introduction of affordable services to most US consumers until this year. Now that the most obvious digital divide between the US and the rest of the wired world has been bridged, software and entertainment producers can begin to exploit personal digital distribution of their products. The adoption of high-speed access by US home users will have a major, positive impact on the business of search as US users will almost certainly imitate the actions of users in other areas that have had broadband access for years.
The difference is measured in time. Whenever it is easier or faster to find information on the Internet than it is to make a phone call or send a fax, broadband users will always tend towards using the Net. For most US home Internet users, it could take between 45 – 90 seconds to connect to the net using dial-up. With broadband, the connection between the Internet and the home computer is never severed. Assuming most businesses have useful websites, finding detailed information about a local business or event is almost always faster online than on the phone. Printed telephone directories will be used less, as will telephones in general.
It won't be long before Hollywood and Brollywood deliver feature films via the Internet directly to home consumers. We already see the music industry moving towards digital distribution of their products, following the lead of the online gaming industry that has been widely enjoyed by users with high-speed access. As a matter of fact, there are now several search tools that find information from television shows by scanning the closed-caption commentary included with many programs. Blinkx TV(beta) captures and indexes video and audio streams directly from television and radio broadcasters to make news, sports and entertainment clips available. Microsoft XP Media Center and TiVo products are both based on the assumption that broadband connection will be the global standard.
Regardless of where the web is going and the role the search sector is going to play in it, most individuals and businesses rely on the free, organic listings. Those listings will remain an important focus for the search engines as they will continue to provide the primary interactive point between home-user and the search engines. The impact of organic placements will obviously be enhanced by the growth of the search sector however once the user follows a link from the organic SERPs, they will likely encounter a great deal of paid-advertising, everywhere else they go. Repetition is the key to memory and competitive advertisers should note the seemingly unlimited power of paid-contextual advertising, especially for Google's AdWords program. When users don't encounter AdWords, they almost certainly encounter advertising from Overture, AskJeeves, FindWhat/Espotting, and others as they all have their plans for 2005.
2005 is going to be an extremely intense year. If things are quiet and peaceful in your universe over the next few weeks (and here's to hoping it is), take a break and read as much as you possibly can. If you have the time to explore, mess around with the new tools and features. Take some professional development time to learn a bit about XML, RSS, FLASH and PHP. Ask your family and friends about their search habits. You may be surprised at the new sophistication that is shaping up. The future, at least as the search sector is concerned is going to be very friendly and increasingly informative. Now that the web is going to become faster for its largest population, it is also going to be increasingly interesting.
MSN held a massive telephone news conference earlier this week to announce its version of a desktop search application. Like Google desktop, MSN's offering spiders and indexes various files found on your computer's hard-drive such as Word documents, Acrobat files, PowerPoint presentations, and spreadsheets. Unlike Google Desktop, this program catalogs a wider variety of files such as Email attachments, photos, music, and even software packages. It also displays results differently, using a pop-up window that changes as you type, gradually reducing the number of references as the topic of your search is typed into the search-box window. MSN desktop also comes with it's own handy toolbar, a means of merging two appliances into one package. The biggest drawbacks I have found thus far is that it only accesses Email from MS Outlook and does not store records of websites previously visited like Google's desktop feature does. It also requires the installation and use of the MSN toolbar which can not be used in conjunction with other toolbars.
MSN's offering places it in the middle of several other desktop appliances released such as Google, Lycos, and Copernic. Both Yahoo and Ask Jeeves are expected to release desktop applications early in the new year.
This is the last edition of the StepForth Weekly News for 2004, making this the perfect time to write a retrospective before moving into the new year. The past year will be remembered as the most interesting year in the history of search, that is until this time next year. 2004 witnessed the end of the search engine cold-war and the beginning of what is likely to be an intense rivalry between Google and MSN. It also showed a clear demarcation between who's hot and who's not in the business of search.
There were more subtle shifts in the business of search last year than most of the previous years combined. 2003 was the watershed year of mergers and acquisitions, a trend that continued well into 2004, but it wasn't until mid-summer that the growth of the industry started to make a lot of sense. The obvious winners of 2004 were the Big3: Google, Yahoo and MSN but underpinning the success of the Big3 were the real winners of 2004; the writers of add-ons, features and innovative technologies related to search.
2004 was the year Blog became a household word and the year that Bloggers fundamentally changed the face of the Web. Blogs were the most powerful tool popularized in the past year and are now supported by every major player in the search field. Bloggers heavily influenced Google rankings, causing Google to change the way it weighs and values incoming links. Bloggers have also changed the tone of journalism and opened a new information publishing frontier to the general public. The first major Blog-based search tool I know of was developed by Loren Barker for Mark Cuban's search engine IceRocket.
The past year was one of announcements, one-ups and positioning as the major search engines struggled to roll out as many improvements and innovations as possible. Items such as search-engine specific toolbars, desktop search applications, local-search features and super-sized Email accounts were introduced to win and retain the loyalty of users. The various battlefronts of the search engine war shifted enormously over the past year, ultimately offering search users 3 unique major search engines, the widest array of independent choices seen in almost four years. At this time last year, Google dominated the organic listings by providing the database for most of its rivals. That changed in first quarter of 2004 when Yahoo introduced its own algorithmic search database. MSN followed with the release of its own search engine late in the third quarter. Even with the growth of its rivals, Google continued to dominate the news this year and was the ultimate winner in 2004.
Many if not all decisions and initiatives in the search industry, regardless of where or by whom they were made, had one common factor. Google's successful IPO had the biggest influence on the business of search last year. Development and innovation throughout the search industry was promoted by the IPO much like the search sector was dominated by Google's database in 2003. For rivals, there was and continues to be an overwhelming fear of Google's seemingly limitless growth plans. Those watching the industry should not make the same mistake the pros did in 2004 by assuming Google's sometimes juvenile hubris demonstrated a lack of long-term planning. Over the last quarter of this year, Google showed that it has as many plans as it has patents, making it almost impossible to predict what the landscape will look like twelve months from now. Given Google's growth, assume the landscape is going to be much larger, covering more of what the Internet can deliver to home and business consumers.
While expansion and introduction of new services was the way of the search world, many of the new products rolled out by search services seem to be copycat productions. Every search tool has a toolbar and each is interested in desktop search. Google was the first of the Big3 to introduce a functioning desktop search feature with MSN introducing their version earlier this week and Yahoo expected to release its version in January 2005. While Google Desktop gathered the most print-space this year, it wasn't the first of the well known names to introduce a desktop appliance. That bragging right goes to Lycos/Hotbot which released a very good desktop search feature in March. Since then, everyone else has fallen over their own feet trying to release their version of desktop search.
The other major trend-setting innovation seen in 2004 was the advent of Local search features. Google and Yahoo dominate the local search market but MSN and several smaller rivals have also shown great interest in local search. At this time, it is difficult to state who is really ahead in this field as both Google and Yahoo offer highly credible local search features. Google likely has the dominant positioning though as it has brokered deals with most of the major telephone directory services to integrate their databases into Google's Local search tool. It is still very easy to get your site into Yahoo local as well.
Assigning the role of losers, while fitting with the "winners" theme is more difficult. The "big losers" of 2004 (if one can call them that) didn't really lose much at all, and continued to introduce technically strong products such as Lycos/Hotbot's desktop search feature. When considered against the Big3 however, the smaller players didn't gain enough ground to be considered major players anymore. The search landscape of early 2005 is fundamentally dominated by Google, Yahoo and MSN. Given the growth of all three over the past twelve months it will be difficult if not impossible to beat them in the next twelve. Smaller players shouldn't lose heart though. 2005 is going to be a time of immense change on the Internet and in the world of search and that change will likely leave some room for maneuvering on the part of AskJeeves, AOL, Lycos and Vivisimo.
Next edition, we will be making our predictions for 2005! Where do you think the world of search is going? Anyone interested in sharing their predictions is welcome to write me over the holidays at jimhedger@stepforth.com . We will be interested in adding ideas from all over the world to the next edition.
What would you do if you were tasked with designing a
new search engine?
You have all the resources the world can offer and the certain knowledge that your project is so important to your employer that mountains, molehills, companies, code and really comfy office chairs will be moved, built or acquired to meet your needs, no questions asked. Your boss demands a product that is better than best and, having failed to notice how overwhelmingly essential search would become back when he came to dominate everything else, appears ready to back your project with missionary zeal and Machiavellian maneuvering. The cold hard truth is, the future of one of the largest corporations in the world, owned incidentally by the world's wealthiest man, may well rest on your shoulders. In this scenario, there are no obstacles, only the challenge of beating Google at Google's best game. Whoa....
MSN released the beta version of their long awaited proprietary search engine earlier this quarter. Beta releases are the software world's version of a dress rehearsal. Mistakes will happen, even in the best productions, and the beta stage is the place to field-test a product, finding and fixing inevitable problems before the real, commercial version of the product is introduced. MSN(beta) search has seen its share of bumps over the past few weeks including a short period when it appeared the search tool had crashed. Regardless of any minor mishaps in its first weeks, MSN(beta) Search shows very good results generated from a database of approximately 5 billion spidered websites it began compiling over a year ago. While MSN(beta) and the search tool found at MSN.Com are different search tools delivering very different sets of results, the results generated by MSN(beta) will eventually replace the Inktomi based listings shown on MSN.Com. That's when the real fun will begin. Please note, as other commentators have pointed out, this is a BETA version and likely to change in coming weeks before the undisclosed live release date.
When told to build a better mousetrap, MSN engineers set their goals fairly high and approached the problem from the most logical point possible. They seem to have looked at the best ideas everyone else has come up with and tried to incorporate them into their search tool. The results are better then expected with highly relevant site listings that have been compared to earlier versions of Google's index. That makes sense given that MSNBot the beta-search spider works very much like GoogleBot, looking for many of the same site elements including incoming links, contextual relationships between linked documents, and overall site context. MSNBot also seems to be interested in keyword-enriched titles and seems especially interested in anchor text.
MSNBot, like GoogleBot and Slurp finds sites for its index by following links from one page to another within or between sites. The majority of sites in MSN(beta)'s index were found by MSNBot as it followed links from sites it had already visited. A check of backlinks, or links recognized by MSNBot as being relevant to a specific site almost always shows much higher numbers than a similar check on Google or Yahoo leading us to conclude that, for the time being at least, MSNBot does not filter links to the same degree as its rivals. In other words, relevancy does not appear to be as strong a factor with this version of MSN(beta) than it is with Google, at first glance anyway. One of the biggest improvements MSN(beta) brags about is its ability to figure out the context of individual paragraphs found on a page and apply that context as a "relevancy" factor against pages that might be linked to from that paragraph. Subsequent paragraphs on the same page might be about totally different topics without undermining the contextual relevancy of the links found in the previous paragraph. Google tends to compare relevancy on a page to page basis, making it more difficult to address a wide ranging topic on one page.
As with Google and Yahoo's spiders, MSNBot likes well defined and functioning link paths within your website. Providing a clear and well explained path for MSNBot to follow is critical to good rankings. The easiest way to accomplish this is to establish a text-based sitemap page appended to your website and be certain there is a link to that sitemap page on each of the other pages in your site. For database driven sites, this can be accomplished by changing the "footer" attribute on the template that creates the base-pages. There is an important thing to note here, especially for webmasters of highly dynamic or commerce driven sites, use static URLs to link to products in your database and do whatever is necessary to avoid tracking systems that append unique user IDs to URLs.
This article is not going to provide a lot of details around these elements as some or even much of what is written is subject to sudden change (this is a beta version after all), and the beta version simply hasn't been around long enough to express reliable ideas in writing yet. Once you have ensured that MSN(beta)'s spider can travel from one end of your site to another, and has a way into your site from an outside reference, take a look at the following elements of your site.
MSNBot seems to really like the techniques used by SEOs at StepForth. StepForth pays a lot of attention to keyword enrichment of the basic but critical elements of a site. Assuming navigation issues have been taken care of, websites that use keyword phrases in titles, anchor text, and early in the page content are doing very well in MSN(beta)'s index. We do not know for sure what MSNBot thinks of meta tags however we recommend using the basic description and keywords meta tags along with robot exclude text when necessary. MSNBot, basically likes clean code with good, common sense SEO. In a previous article, we republished the guidelines MSN posted to the MSN(beta) search site.
MSNBot Guidelines, at a glance:
Incoming links from other websites with keyword-enriched anchor text used to phrase the links
Easily read code that has been W3C validated
As with all search engines, best results are found when you only address one topic per page
Keep your page site reasonable, 150kb is the maximum size recommended in the MSN guidelines
Apply keyword phrases to well written sentences early in the code. Don't use techniques such as keyword stuffing or invisible text.
Use a sitemap to ensure that every page in your site is open to MSNBot.
There is a keyword density rule for MSNBot however we do not think that keyword density is the same for every business sector. For instance, the optimal keyword density for Maryland real estate will be different than the optimal keyword density California real estate, even though sites found under those keywords will represent the same business sector.
Any common sense rule that applies to SPAM on other search engines applies at MSN(beta) as well.
The MSN(beta) search engine is slated for full release any time now but, as with other Microsoft products, that doesn't necessarily mean we're going to see it anytime soon. The engine has been very stable over the past two weeks and is providing very strong and consistent results. Any bugs that remain to be worked out are well hidden and do not seem to be effecting the search function in any discernible way. When MSN does release their search engine as a full-version at MSN.Com, they will have a good tool that presents a credible alternative and serious challenge to Google and Yahoo. The long days of mono-culture search are over.
Innovation in the world of search seems to come in waves with the major search engine firms appearing to follow each other's lead in the development of new products, tools and services. Witness today's introduction of a desktop search/toolbar by MSN. Search engines are standardizing their services around the basic business model of contextual ad delivery and introducing new products and features designed to win the loyalty of new users and retain the loyalty of old ones. The past year has been one of the most expansive and interesting in the world of search since day one. Two major trends, personalization and localization, combined with the competitive necessity to gain users and advertisers provided the foundation for development of desktop search applications and the immense number of toolbars available now. The goal of all major search firms is to offer results that are relevant to an individual searchers' profile in the least steps possible. User adoption of toolbars and desktop search are major steps in accomplishing that goal.
Advertising in the form of increasingly personalized, contextual delivery is going to pay the bills, at least for the foreseeable future. The Internet is about to under go its most massive growth spurt yet and a large part of that growth will be driven by the unique search patterns of every individual user of a search engine toolbar or desktop search appliance. Without radically overstepping the boundaries of the spirit of personal privacy laws, (which differ from nation to nation), search engines have been gathering veritable gold mines of information on every registered individual's searching habits. In other words, your machine has a number and that number is you, or at least it is representative of the most frequent users' search habits.
This expansion, at least as it relates to the world of search is based on two fundamental premises.
The first and most important premise is that an individual's surfing habits can determine user-specific information to be served to them. Those folks who plant ad-ware and spyware through free software downloads aren't the only ones interested in knowing where you've been going. Google, Yahoo, MSN, ASK, and every other search engine that releases a toolbar, are also pretty keen on knowing what you are interested in. It helps them send the right paid advertising to your search-browser, thus increasing the likelihood of successful conversions for their clients. There is a lot of business interest in delivery of advertising information directly to individuals and thus, localization and personalization go hand in hand with each other. It is almost totally unlikely that the major search engines share your personal information with other commercial interests, especially considering the competitive advantage having such information gives them.
The second premise is that the Internet as we know it today will expand into an electronic meta-verse which can be cataloged by the search engines. Distribution of print media is a slowly dying business and all other forms of information or entertainment can be recorded or broadcast electronically. In the near future, information and entertainment choices will be presented to consumers primarily through listings based on search results. In many ways, this world already exists. Barring any sudden disasters, consolidation and convergence will start to wrap it up in easy to perceive packages within the next two years.
This emerging electronic meta-verse will include television programming, music delivery, radio-format broadcasting, first-run movies and live events. It will also include tens of millions of independent creations as technology advances to allow anyone with (or without) talent to produce and web-cast their own media. For an early experiment in merging mediums through a search driven media (without ad-content), check out www.zed.cbc.ca/. While this example has obvious commercial limitations, it illustrates the general concept of electronic meta-media.
It's only a matter of time before search engines themselves present increasingly specific directories of cultural fare, some of which they will be producing themselves. In short time, those directories will become spidered databases and those databases will become second cousins to the search tools of today. The search driven nature of the emerging meta-media universe offers a virtually limitless amount of advertising space and will make the search firms (and those who do business with, for, or on them) into the dominant players in the advertising industry. It will also change the nature of the relationship search engines currently have with those displaying paid-advertising such as AdWords and Overture ads. If you thought 2004 was an interesting year in the business of search, wait to see what's coming in the next few years. By the time Beijing hosts the 2008 Olympics, the major TV networks will have adapted.
Google "...is big. Really Big. You just won't believe how vastly hugely mind-bogglingly big it is." (excerpt from The Hitchhiker's Guide to the Galaxy)
Google is the most powerful information resource humans have ever constructed. The power of any major search tool boggles the mind but considering the vastness of Google's complex simplicity can truly hurt one's brain. With over 8-billion references in its rapidly growing, organically generated index, Google sets the standards other search engines follow. Benefiting from a three year reign as the undisputed leader of search, Google has had a very good year and looks poised to make 2005 an even better year.
In 2004, Google introduced more new and improved applications for its users than any other tech company, posted one of the most successful IPO's in business history in a most unorthodox Dutch-Auction format, and met or exceeded any challenges its rivals threw at.
Google is no longer just a search engine, it is an advertising machine. Drawing about 90% of its revenues from paid advertising and contextual ad-delivery, Google has had two major focuses this quarter. The first is increasing the number of places paid-advertising might show up. The second is to develop new products and features that will retain current user loyalty and win new users from the other search firms. Both initiatives rely heavily on Google's reputation for delivering fast, free and relevant search results. Google has the world's largest database of indexed websites and it acquires site information through its spider GoogleBot.
GoogleBot is probably the most well-known spider working the web today. It is also likely among the most analyzed applications ever written. On one level, GoogleBot is quite simple and can be depended on to act in a very specific manner. GoogleBot lives to follow links. GoogleBot will often chase down a link-path until it can no longer work its way deeper into a site. It will also work its way through any site linked to from any other site. Google finds the majority of new sites in its index by following links from established sites. If a link exists, Google will (A) find it, (B) follow it, (C ), record every bit of information it can possibly record, and (D) weigh that information against a fairly rigid algorithm to determine the perceived topic or theme of a site for future reference. If a site in Google's index is modified or changes, Google will re-spider the site as quickly as it possibly can.
GoogleBot's mission is to create a snap-shot of the World Wide Web and store it across Google's network of data centers around the world. When you reference information from Google, the results you see reflect Google's most recent snap-shot of the web. Parts of that snap-shot might be hours or even weeks old but overall the index is updating itself every minute of every day, 24/7. The fastest way to see exactly what Google views as the most recent version of your site is to click on the “Cached” link generally below the main link-reference Google displays for your site.
How GoogleBot behaves as it acquires sites is one thing. What Google does with the information its bot gathers is another thing. Google's method of ranking websites is extremely (and increasingly) complex. To understand how Google works today, a brief (and over simplified) explanation of the principle of PageRank is in order.
Google was originally developed as a means of finding information in research documents at Stanford University where its inventors Larry Page and Sergey Brin met as grad students. PageRank was developed as the basic sorting algorithm for their search tool (then known as Backrub) and was based on a very simple concept, trust.
Page and Brin understood that documents on the Internet could be linked together. They speculated that if someone took the time to code a link (by hand in those days) to another document there was likely a relevance between the two documents. Why else would one researcher link to another researcher's work? Simply put, the more incoming links a particular document has, the better it would rank when sorted by PageRank. Given the environment in which it was developed, Google's genesis proved to be the perfect tool for intelligent users. Tranfering that simplicity from a dorm room at Stanford to practically every living room and office space on Earth has been a great challenge for Google's engineers. While it is still somewhat based on the original, “democratic” nature of PageRank, Google's sorting algorithm has become infinitely more complicated.
Google continues to weigh the number of links directed towards a site as positive indicators that there is relevant information to be found there. Since links are the veins and arteries of the web, links continue to be the most important factor influencing Google's perception of the relevance of a website. As the Google index has grown so rapidly over the past six years, and search engine marketers have learned how to use Google's behaviours to influence rankings, Google weighs several other factors when considering the relevance of a site but the core of the algorithm remains rooted in PageRank.
Not all Links are Created Equal
Back in the good old days, seven or eight years ago at Stanford, one link could represent one positive vote. As marketers learned to manipulate links, Google learned to apply different standards and measures when looking at those links and the content of sites in its index. Today, Google considers different links in different ways. As a matter of interest, our recent studies show that Google displays less back-links for sites than any other search engine, leading us to conclude that Google has become much stricter about how it views and values incoming links.
Google looks at a number of factors when determining the value of a link. Where the link originates from is as important as where the link is directed in Google's eyes. Google, like its rivals, is trying to find relationships between documents aside from obvious keywords. Google has the ability to fundamentally understand documents in its index and determine the topic, theme or context of those documents. This is an important measure as Google is becoming increasingly strict about link-relevance. To receive a highly positive response from Google, the pages or sites linked together must somehow relate to each other in topic as well as by sharing similar keywords. An excellent example would be in regional tourism.
A local tourism bureau will almost certainly have a website. That site will link to the sites of member-clients in its region. Each of those sites represent businesses dependent on regional tourism, thus establishing relevance between the sites. The tourism bureau becomes the “hub” from which Google follows links to other, topically related websites. In this way, the Hub site becomes a highly positive link-reference in Google's eyes.
The very best links, in Google's eyes, come from “authority sites”. An authority site is one that is very well established and respected such as mainstream news sites (CNN, TIME, NYTimes, etc...) other search directories, industrial leaders (Macromedia, HP, Pitney Bowes, Nike, etc...), and other highly credible sources such as the regional tourism bureau mentioned above. While a website doesn't necessarily have to represent a large corporation to be considered an authority site, the sheer number of pages and references, combined with high visitor numbers generally associated with large corporate sites helps. Some personal Blogs, smaller companies and alternative news sources/blogs have also enjoyed “authority” status. This status is, in some ways, flexible and situational. A link from the tourism bureau mentioned above will not tend to help a business outside of its region unless a tangible relevancy factor is somehow introduced.
In practical terms, the “authority” status of a website is irrelevant for SEOs as the vast majority of sites in Google's index are just regular, run of the mill websites run by regular, run of the mill folks like us. Small businesses, researchers, governments, NGOs, musicians, artists, families, hobbyists and others write websites to offer the world access to their information. 99.999999% of these sites contain links of some sort or another and the vast majority of those links lead to topically relevant documents. While not “authority” sites, Google still considers these links extremely important when sorting and ranking sites. Again, the stress is on topical relevancy as Google places enormous value in good, solid links.
Google does not live on links alone
Much as been written in this article and thousands of others about Google and links. If links were the only factor Google looks at, the SEO business would not exist and Google's index would be as off-kilter as a Batman set. As stated in previous paragraphs, Google has the ability to read sites and understand what it is reading. Google is able to reference a world of information when figuring out the context of text used in Titles, Meta Tags, Body Text and Anchor Links. Since we know that Google is actually reading and comprehending content, we need to place specific content in places we know GoogleBot likes to look for it. Writing and placing this information is where SEO becomes an artful science that stems from simple common sense. Think about what Google knows about your website before it even visits.
It finds your site by following links. Therefore it "assumes" your site is topically relevant to the site it acquired the link to your site from. Google knows the address of the site, the URL. It also knows what anchor text the original linking site used when phrasing the link to your website. Keyword enrichment of both elements is beneficial with Google. In other words, if you can, use a target keyword phrase in the URL of your site, and request that others linking to your site use your target keyword phrases as the anchor text of links directed to your site.
Once Google hits your site, it learns a lot more very quickly. It sees the title, tags, text and links, and records these elements as it moves through the site. These are the basic elements SEOs examine and modify when working on your site.
The first thing GoogleBot sees is the title of the site. Keyword enriched titles are very useful but webmasters are cautioned to be very conservative in the number of keywords or phrases they place in the title of a page. We generally use two or three keyword phrases when writing titles. Page titles should be page specific with keywords focused on the topic of the page. The second (or third) keyword set in the title is used to provide an overall context to the site. For example, title="Blue Widgets :: Preformed Blocks and Spacers :: Construction Materials" Overloading the title with keywords is useless and may be considered spam in extreme cases.
Next, Google looks at the meta tags. Unless you wish to exclude Google from sections of your site, there are only two really important meta tags, the description and the keywords tags. Of these two, the description is the most important. Google uses the description tag as a topical reference and may draw from the description tag when generating the two to three sentence site description shown under links in the SERPs. As with titles, each page should have a page specific description tag that outlines the topic of that page and the theme of the overall site. The keywords tag is of much lesser importance but is still considered to carry minor weight. Mentioning keywords that might be associated with your website, including common misspellings doesn't hurt. Packing the keywords tag with dozens of mentions of the same word, or using keywords that do not relate to your website might. We still use the keyword tag on client sites and still use page-specific keyword tags.
After the meta tags, Google looks at page content or body text. Again, relevance is extremely important. The Internet is a very big place and Google's index is pretty big itself. Finding documents in an 8-billion page universe requires precision. Webmasters can help themselves by simply addressing one topic or issue per page. Google is extremely intelligent and intuitive, but even the smartest robots get confused. Keeping it simple for GoogeBot makes good ranking much simpler to achieve for your site. As Google reads information from left to right in columns, like we read a newspaper, placing your keyword phrases early in the body text of pages in your site is very beneficial. Well written sentences that are topically focused are the best spider food for Google as it has become wary of words that "float" on a page without supporting words to provide context.
Lastly, GoogleBot comes back to links. GoogleBot moves through your website following links you place there. It reads the text that phrases the links to determine what it might find when it gets to the next page. For example, the second page in most websites is the "About Us" page. Billions of websites use "About Us" as the anchor text linking the index page to the about us page. A better link would read About "Blue Widgets Inc." as the keyword phrase Blue Widgets is used as the anchor text from one page to the next. Keyword enrichment of anchor text also effects Google's perception of external links . Going back to our tourism bureau example, a link to a local bed and breakfast might read "Humboldt House" Bed and Breakfast or it might read Humboldt House "Victoria – Bed and Breakfast". The anchor text used in the second example would be far more beneficial than the first.
Remember, links provide the pathway for GoogleBot and other spiders. A final element that should be included on all pages is a text-based sitemap that links to all pages in the site and is linked to from the Home or INDEX page.
In a nutshell, that's how GoogleBot examines a site. Here is a quick rundown of which elements GoogleBot is looking for:
Relevant Incoming Links
Good URLs that are not too spammy
Easy to follow link paths including a sitemap
Keyword enriched titles
Well written Description Meta Tag
Well written Keywords Meta Tag (less important than Description)
Google "...is big. Really Big. You just won't believe how vastly hugely mind-bogglingly big it is." (excerpt from The Hitchhiker's Guide to the Galaxy)
Google is the most powerful information resource humans have ever constructed. The power of any major search tool boggles the mind but considering the vastness of Google's complex simplicity can truly hurt one's brain. With over 8-billion references in its rapidly growing, organically generated index, Google sets the standards other search engines follow. Benefiting from a three year reign as the undisputed leader of search, Google has had a very good year and looks poised to make 2005 an even better year.
In 2004, Google introduced more new and improved applications for its users than any other tech company, posted one of the most successful IPO's in business history in a most unorthodox Dutch-Auction format, and met or exceeded any challenges its rivals threw at.
Google is no longer just a search engine, it is an advertising machine. Drawing about 90% of its revenues from paid advertising and contextual ad-delivery, Google has had two major focuses this quarter. The first is increasing the number of places paid-advertising might show up. The second is to develop new products and features that will retain current user loyalty and win new users from the other search firms. Both initiatives rely heavily on Google's reputation for delivering fast, free and relevant search results. Google has the world's largest database of indexed websites and it acquires site information through its spider GoogleBot.
GoogleBot is probably the most well-known spider working the web today. It is also likely among the most analyzed applications ever written. On one level, GoogleBot is quite simple and can be depended on to act in a very specific manner. GoogleBot lives to follow links. GoogleBot will often chase down a link-path until it can no longer work its way deeper into a site. It will also work its way through any site linked to from any other site. Google finds the majority of new sites in its index by following links from established sites. If a link exists, Google will (A) find it, (B) follow it, (C ), record every bit of information it can possibly record, and (D) weigh that information against a fairly rigid algorithm to determine the perceived topic or theme of a site for future reference. If a site in Google's index is modified or changes, Google will re-spider the site as quickly as it possibly can.
GoogleBot's mission is to create a snap-shot of the World Wide Web and store it across Google's network of data centers around the world. When you reference information from Google, the results you see reflect Google's most recent snap-shot of the web. Parts of that snap-shot might be hours or even weeks old but overall the index is updating itself every minute of every day, 24/7. The fastest way to see exactly what Google views as the most recent version of your site is to click on the “Cached” link generally below the main link-reference Google displays for your site.
How GoogleBot behaves as it acquires sites is one thing. What Google does with the information its bot gathers is another thing. Google's method of ranking websites is extremely (and increasingly) complex. To understand how Google works today, a brief (and over simplified) explanation of the principle of PageRank is in order.
Google was originally developed as a means of finding information in research documents at Stanford University where its inventors Larry Page and Sergey Brin met as grad students. PageRank was developed as the basic sorting algorithm for their search tool (then known as Backrub) and was based on a very simple concept, trust.
Page and Brin understood that documents on the Internet could be linked together. They speculated that if someone took the time to code a link (by hand in those days) to another document there was likely a relevance between the two documents. Why else would one researcher link to another researcher's work? Simply put, the more incoming links a particular document has, the better it would rank when sorted by PageRank. Given the environment in which it was developed, Google's genesis proved to be the perfect tool for intelligent users. Tranfering that simplicity from a dorm room at Stanford to practically every living room and office space on Earth has been a great challenge for Google's engineers. While it is still somewhat based on the original, “democratic” nature of PageRank, Google's sorting algorithm has become infinitely more complicated.
Google continues to weigh the number of links directed towards a site as positive indicators that there is relevant information to be found there. Since links are the veins and arteries of the web, links continue to be the most important factor influencing Google's perception of the relevance of a website. As the Google index has grown so rapidly over the past six years, and search engine marketers have learned how to use Google's behaviours to influence rankings, Google weighs several other factors when considering the relevance of a site but the core of the algorithm remains rooted in PageRank.
Not all Links are Created Equal
Back in the good old days, seven or eight years ago at Stanford, one link could represent one positive vote. As marketers learned to manipulate links, Google learned to apply different standards and measures when looking at those links and the content of sites in its index. Today, Google considers different links in different ways. As a matter of interest, our recent studies show that Google displays less back-links for sites than any other search engine, leading us to conclude that Google has become much stricter about how it views and values incoming links.
Google looks at a number of factors when determining the value of a link. Where the link originates from is as important as where the link is directed in Google's eyes. Google, like its rivals, is trying to find relationships between documents aside from obvious keywords. Google has the ability to fundamentally understand documents in its index and determine the topic, theme or context of those documents. This is an important measure as Google is becoming increasingly strict about link-relevance. To receive a highly positive response from Google, the pages or sites linked together must somehow relate to each other in topic as well as by sharing similar keywords. An excellent example would be in regional tourism.
A local tourism bureau will almost certainly have a website. That site will link to the sites of member-clients in its region. Each of those sites represent businesses dependent on regional tourism, thus establishing relevance between the sites. The tourism bureau becomes the “hub” from which Google follows links to other, topically related websites. In this way, the Hub site becomes a highly positive link-reference in Google's eyes.
The very best links, in Google's eyes, come from “authority sites”. An authority site is one that is very well established and respected such as mainstream news sites (CNN, TIME, NYTimes, etc...) other search directories, industrial leaders (Macromedia, HP, Pitney Bowes, Nike, etc...), and other highly credible sources such as the regional tourism bureau mentioned above. While a website doesn't necessarily have to represent a large corporation to be considered an authority site, the sheer number of pages and references, combined with high visitor numbers generally associated with large corporate sites helps. Some personal Blogs, smaller companies and alternative news sources/blogs have also enjoyed “authority” status. This status is, in some ways, flexible and situational. A link from the tourism bureau mentioned above will not tend to help a business outside of its region unless a tangible relevancy factor is somehow introduced.
In practical terms, the “authority” status of a website is irrelevant for SEOs as the vast majority of sites in Google's index are just regular, run of the mill websites run by regular, run of the mill folks like us. Small businesses, researchers, governments, NGOs, musicians, artists, families, hobbyists and others write websites to offer the world access to their information. 99.999999% of these sites contain links of some sort or another and the vast majority of those links lead to topically relevant documents. While not “authority” sites, Google still considers these links extremely important when sorting and ranking sites. Again, the stress is on topical relevancy as Google places enormous value in good, solid links.
Google does not live on links alone
Much as been written in this article and thousands of others about Google and links. If links were the only factor Google looks at, the SEO business would not exist and Google's index would be as off-kilter as a Batman set. As stated in previous paragraphs, Google has the ability to read sites and understand what it is reading. Google is able to reference a world of information when figuring out the context of text used in Titles, Meta Tags, Body Text and Anchor Links. Since we know that Google is actually reading and comprehending content, we need to place specific content in places we know GoogleBot likes to look for it. Writing and placing this information is where SEO becomes an artful science that stems from simple common sense. Think about what Google knows about your website before it even visits.
It finds your site by following links. Therefore it "assumes" your site is topically relevant to the site it acquired the link to your site from. Google knows the address of the site, the URL. It also knows what anchor text the original linking site used when phrasing the link to your website. Keyword enrichment of both elements is beneficial with Google. In other words, if you can, use a target keyword phrase in the URL of your site, and request that others linking to your site use your target keyword phrases as the anchor text of links directed to your site.
Once Google hits your site, it learns a lot more very quickly. It sees the title, tags, text and links, and records these elements as it moves through the site. These are the basic elements SEOs examine and modify when working on your site.
The first thing GoogleBot sees is the title of the site. Keyword enriched titles are very useful but webmasters are cautioned to be very conservative in the number of keywords or phrases they place in the title of a page. We generally use two or three keyword phrases when writing titles. Page titles should be page specific with keywords focused on the topic of the page. The second (or third) keyword set in the title is used to provide an overall context to the site. For example,
Overloading the title with keywords is useless and may be considered spam in extreme cases.
Next, Google looks at the meta tags. Unless you wish to exclude Google from sections of your site, there are only two really important meta tags, the description and the keywords tags. Of these two, the description is the most important. Google uses the description tag as a topical reference and may draw from the description tag when generating the two to three sentence site description shown under links in the SERPs. As with titles, each page should have a page specific description tag that outlines the topic of that page and the theme of the overall site. The keywords tag is of much lesser importance but is still considered to carry minor weight. Mentioning keywords that might be associated with your website, including common misspellings doesn't hurt. Packing the keywords tag with dozens of mentions of the same word, or using keywords that do not relate to your website might. We still use the keyword tag on client sites and still use page-specific keyword tags.
After the meta tags, Google looks at page content or body text. Again, relevance is extremely important. The Internet is a very big place and Google's index is pretty big itself. Finding documents in an 8-billion page universe requires precision. Webmasters can help themselves by simply addressing one topic or issue per page. Google is extremely intelligent and intuitive, but even the smartest robots get confused. Keeping it simple for GoogeBot makes good ranking much simpler to achieve for your site. As Google reads information from left to right in columns, like we read a newspaper, placing your keyword phrases early in the body text of pages in your site is very beneficial. Well written sentences that are topically focused are the best spider food for Google as it has become wary of words that "float" on a page without supporting words to provide context.
Lastly, GoogleBot comes back to links. GoogleBot moves through your website following links you place there. It reads the text that phrases the links to determine what it might find when it gets to the next page. For example, the second page in most websites is the "About Us" page. Billions of websites use "About Us" as the anchor text linking the index page to the about us page. A better link would read About "Blue Widgets Inc." as the keyword phrase Blue Widgets is used as the anchor text from one page to the next. Keyword enrichment of anchor text also effects Google's perception of external links . Going back to our tourism bureau example, a link to a local bed and breakfast might read "Humboldt House" Bed and Breakfast or it might read Humboldt House "Victoria – Bed and Breakfast". The anchor text used in the second example would be far more beneficial than the first.
Remember, links provide the pathway for GoogleBot and other spiders. A final element that should be included on all pages is a text-based sitemap that links to all pages in the site and is linked to from the Home or INDEX page.
In a nutshell, that's how GoogleBot examines a site. Here is a quick rundown of which elements GoogleBot is looking for:
Relevant Incoming Links
Good URLs that are not too spammy
Easy to follow link paths including a sitemap
Keyword enriched titles
Well written Description Meta Tag
Well written Keywords Meta Tag (less important than Description)