This morning the Google Blog announced a staggering milestone that even had Google engineers astounded; 1 Trillion unique URLs were found by Google on the web at once.
This statistic not only shows the amazing power of Google's indexing engines but outlines just how massive the Internet has become... it is truly mind boggling.
The full blog posting is located here but the following are some interesting segments quoted from the blog posting.
How big is the Net if that is what Google manages to index? "Strictly speaking, the number of pages out there is infinite -- for example, web calendars may have a "next day" link, and we could follow that link forever, each time finding a "new" page. We're not doing that, obviously, since there would be little benefit to you."
Are all of those 1 trillion unique URLs indexed? "We don't index every one of those trillion pages -- many of them are similar to each other, or represent auto-generated content similar to the calendar example that isn't very useful to searchers."
How does Google manage so much information? "Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it'd be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections."
So at this point it is safe to say that the next milestone of 1 Trilliard won't be far off (that is 1000 trillion unique URLs) at the rate Google's capacity is growing; and the future orders of magnitude are even more mind boggling. Isn't it wonderful to live in a world so connected that kind of information can be available at our fingertips? Zowweee!! What other word is there to describe such nirvana :-)
June 30th 2008 was a day that Flash developers had been waiting for a long time; Google and Adobe had finally announced that Flash .swf files could be crawled by Google! In fact, the extensive news release from the Adobe Developer Center also stated that Yahoo would be incorporating similar technology in short order. When I read this news and the consequential articles from the web marketing community it became very clear that this update was a great step but far from the fix that some Flash developers are likely to pitch to their clients. As a result, I wanted to add my voice to the buzz on this topic and share with you my thoughts on how to optimize a site using Flash while considering the current updates.
What is Flash? Okay, lets get down to basics. To introduce and establish what Flash is all about I am going to fall back on Wikipedia for a concise description:
"Adobe Flash (previously called Shockwave Flash and Macromedia Flash) is a set of multimedia technologies developed and distributed by Adobe Systems and earlier by Macromedia. Since its introduction in 1996, Flash technology has become a popular method for adding animation and interactivity to web pages; Flash is commonly used to create animation, advertisements, and various web page components, to integrate video into web pages, and, more recently, to develop rich Internet applications." Source, Wikipedia
BEFORE: Search Engines Could Not Crawl Flash Up until recently the textual content found in .swf Flash files was, for all intents and purposes, just as unreadable for search engine spiders as the text in images; only HTML text on a page could be read and indexed by search engine spiders because they could not yet (and still cannot) conduct on-the-fly optical character recognition.
To explain this differently I think of the HTML that spiders can read like the braille-like feeling of running your finger over a letter written in ball point pen; you can feel the contour of writing. Whereas something unreadable like Flash or an image on a page is like running your fingers along a 4x6 picture of a road sign... you won't feel anything, so by the same token the text on that road sign cannot be read by a search engine spider.
NOW: Search Engines Can Crawl Text in Flash For the first time, on June 30th, 2008 Google announced it could accurately spider the textual content hidden within Flash files found on the Internet. This major announcement was enabled by a partnership between Adobe, Google and Yahoo where Adobe provided their proprietary Flash Player technology to the search engines so they could integrate it into their systems and successfully 'read' the content within Flash files. This technology has vast implications for Google's and soon Yahoo's indexes because, at least in Google's case, this allows the search engine to index the content within over 70.4 million Flash (SWF) files. That is a vast amount of content that was previously inaccessible to the search engines and the ability to access it could add a lot of value for search engine users.
For example, an inspiring and eloquent Flash site like Forests Forever could be indexed which would expose more viewers to a website that provides a wonderful introduction to the world's forests. Of course that is just one Flash site of many that will add value to search engines when indexed; it just happens to be one of my personal favourites.
Search Engine Optimization Now Possible with Flash The implementation of Flash crawling technology means that the text within Flash can now be indexed and links can be followed. Here are some examples of the basic optimization that is now possible within Flash:
Optimizing page content for specific keyphrase(s) to ensure a visiting search engine bot will correctly perceive the page's topic.
Using keywords within internal links to pass link juice from page to page; only applicable for sites where the Flash pages are broken down onto separate URLs.
Providing emphasis (bolding) to particular words may help to emphasize keyphrase(s); but I am reaching here... it is unknown if this new technology provides text-importance recognition.
The Limitations of Flash Search Engine Optimization Now that you have some idea of what can now be optimized for search engines here are a few pitfalls that still limit the search engine friendliness of Flash:
Single URL Flash Websites: Many websites I encounter still incorporate all of the website in a single Flash file; in other words as a user navigates the site they are still using the same URL but different pages appear. In such an instance the search engines will index the content and potentially drive traffic to the site but as Google cannot link to content within a Flash file all users will be sent to the beginning of the file. That type of indirect search result is likely to infuriate many searchers who have come to expect immediate results.
Here is a quote from Google's comment area on this topic:
"We’ve heard requests for deep linking (linking to specific content inside file) not just for Flash results, but also for other large documents and presentations. In the case of Flash, the ability to deep link will require additional functionality in Flash with which we integrate."
That last line is interesting because it leaves room for interpretation. Do they mean Adobe will have to add the "additional functionality" to Flash or that Google needs to beef up their indexing technology to take advantage of the existing Flash functionality? Perhaps some Flash gurus out there could weigh in on this one. It is definitely an ambiguous way for Google to answer the question.
If you need a work-around to deep-link single SWF files Adobe notes a solution: "you can create multiple HTML files that provide different variables to the SWF and start your application at the correct subsection. By creating multiple entry points, you can get the benefits of a site that is indexed as a suite of pages but still only need to manage one copy of your application."
Text in Images is Not Indexed: Many Flash websites inexplicably incorporate a great deal of textual content within images and currently search engines cannot index text in images; I expect that will remain true for at least another year or two. As a result, a Flash website that includes a vast amount of text within graphics will not see a noticeable benefit to this enhanced crawling technology.
Resource-File Based Content Not Indexed: I noted this in Google's comment area from their support team: "At this time, content loaded dynamically from resource files is not indexed. We’ve noted this feature request from several webmasters -- look for this in a near future update."
In addition, Google's news release announced the following limitations to Flash that Google expects to surmount soon (quoted from Google blog):
"Googlebot does not execute some types of JavaScript. So if your web page loads a Flash file via JavaScript, Google may not be aware of that Flash file, in which case it will not be indexed."
"We currently do not attach content from external resources that are loaded by your Flash files. If your Flash file loads an HTML file, an XML file, another SWF file, etc., Google will separately index that resource, but it will not yet be considered to be part of the content in your Flash file."
"While we are able to index Flash in almost all of the languages found on the web, currently there are difficulties with Flash content written in bidirectional languages. Until this is fixed, we will be unable to index Hebrew language or Arabic language content from Flash files."
Verdict: SEO for Flash is Still in Diapers It is wonderful news that Flash is becoming more search engine friendly and there is no question that the addition of previously unattainable Flash content to search engine indexes will prove valuable. But the fact of the matter is that at this moment I wouldn't dream of telling a client that Flash can be a competitive medium for search engine optimization. There are simply too many roadblocks that still exist and need to be addressed before a Flash website has any hope of competing with an HTML website on the basis of just search engine optimization. I do, however, see a couple exceptions to the rule:
At a certain point a threshold can be met where significant incoming links can push even the most un-search engine friendly website to the top rankings. As a result, it is highly likely that some Flash websites with a decent incoming link support structure will see vast improvements in rankings when their content is finally considered thanks to this new crawling technology.
In less competitive arenas (obscure keyphrases or keyphrases with little competition) the basic search engine optimization capabilities opened to Flash may very well be all that is needed to attain top search engine rankings.
In conclusion I would like to pass on extreme kudos to Adobe, Google and Yahoo for working this new technology into their systems. With all of the new multimedia formats coming online it has always seemed quite silly to me that Flash, having been around for years, was still not fully indexable. Thankfully Flash can now be crawled and the day where it could potentially compete for competitive rankings is on the distant horizon.
I just finished writing a post for Search Engine Guide on Yahoo's latest idea that I think is positively brilliant. Here is the lead in if you would like to go there and read it in its entirety:
"At Yahoo Anecdotal today Yahoo announced they had recently opened the Yahoo Accessibility Lab; a place where only Yahoo employees (for now) can experience the world of the Internet as a disabled web surfer would. Read on for a peak at accessibility guidelines and more on what Yahoo had to say about this important issue."
Written by Scott Van Achte and published at 9:10 AM
Yahoo has their second quarter shareholder conference call Tuesday evening and the numbers are in.
For Q2 of 2008 Yahoo's Revenues were $1.79 Billion, representing a 6% increase over 2007 Q2 Results. Their cost of revenues were also up substantially resulting in a gross profit up by only $18 million compared to 2007. Granted that a profit of more than a Billion dollars is nothing to sneeze at, but $18 million in growth is really peanuts in this multi-billion dollar industry.
For a full rundown of all the figures or to listen to the conference call, visit Yahoo Investor Relations.
Written by Scott Van Achte and published at 6:45 AM
Rambler Media, a Russian company, has recently sold their contextual ad service, “Begun” to Google for $140 million according to Reuters and announced in a Google Press Release Friday. As part of the deal, Rambler also agreed to use Google results and advertising on its Russian based site, www.rambler.ru. Ramblers search results will be enhanced by Google with Google Ads being displayed along side.
"Google is very committed to giving Russian users, advertisers and partners the best possible service and experience," said Mohammad Gawdat, Managing Director Emerging Markets, Google. "This agreement will result in better search results and more relevant advertising for our Russian users and publishers."
Currently Rambler Media owns 50.1% of Begun and will be buying the remaining 49.9% from Bannatyne Limited before selling the entire firm to Google.
Written by Scott Van Achte and published at 4:17 PM
Google’s earnings are in after the second quarter report was issued yesterday. While their net income is up from the same period in 2007 by approximately $325 Million (925.1 Million in Q2 of 2007, 1.25 Billion in Q2 of 2008), this is not as high as was expected – Wall Street was expecting to see $4.74 per share – actual earnings were $4.63 per share.
With the economy down on its luck Google still managed to make more than $300 Million – roughly a 35% increase over the same period last year.
Microsofts profits for Q2 are also up by roughly 46 cents a share. They saw earnings in Q2 of 2007 at $3 Billion, and an increase of 42% to $4.3 billion for 2008. These are impressive earnings, however, Microsoft’s online business did not help much in terms of this profit. Compared to 2007, Q2 of 2008 actually saw a loss of $488 Million!
Had Microsoft’s Internet division actually turned a profit, they would have achieved Wall Streets estimate of 47 cents per share, rather than their actual 46 cents.
Yahoo has not yet released their Q2 Earnings for this year. Their shareholder conference call to discuss earnings is scheduled for July 22 at 5:00pm ET.
Written by Scott Van Achte and published at 4:09 PM
comScore released the latest figures for search engine rankings and their respective market share Friday, and Google has actually seen a drop! For all you Yahoo and Microsoft fans out there, don’t get too excited, it’s quite small to say the least.
For June 2008 Google has shown a 61.5% hold on market share, down 0.3% from May of 2008. Yahoo also showed a 0.3% change, only in the positive direction moving from 20.6% up to 20.9%. Microsoft had the biggest leap of a whopping 0.7% up from 8.5% to 9.2%.
When comparing numbers from June 2007 with June 2008, Google has seen a sizable gain stealing users from MSN and Yahoo. In the past 12 months Google has seen an increase of 6.6% with Yahoo and Microsoft both losing, 2.9%, and 3.1% respectively.
Written by Scott Van Achte and published at 3:16 PM
ICANN has recently approved a proposal to expand the availability of top level domains to virtually unlimited levels – at least if your pockets are deep enough.
On Monday the Wall Street Journal reported that businesses, or even individuals with money to burn will be able to apply for top level domain names using just about anything they want as the suffix.
What this means is, rather than the usual .com, or .ca extensions, companies could purchase the rights to brand name extensions such as “.google”, or “.ebay”, or if I was given a hefty raise and felt the need, I could spend a half million dollars and register “www.scott.vanachte”, (although I am not sure what I would do with such a pricey piece of online real estate).
Don’t get too excited however, according to ICANN it could be upwards of two years before the new domains are released, and with these vanity domains going for as much as a half million dollars, we are likely to only see them used by the big corporations until prices come down.
Marketing Sherpa is a company I have nothing but respect for due to their years of offering outstanding marketing handbooks and case studies.
Recently they updated one guide that just plain begged to be advertised - because it has the information that can increase your online conversions as much as 55%. The best part is this isn't any fluff... everything they say is backed up with hard proof based on case studies and test after test.
If you have a website that could use improved conversions (who doesn't ) check out the revised 2008 Landing Page Handbook without worrying about paying for something that won't work for you because they have a full money-back guarantee so if you don't like it send it back and get your money back. I wish everything could be that simple. Here is the info on the guide:
MarketingSherpa's Landing Page Handbook -> Page Design & Copy Instructions -> 54 Stat & Data Charts -> 114 Samples of Landing Pages to Copy -> Help for Search, Email, B-to-B, Ecommerce, Blogs & Lead Generation Conversions
Risk-Free: 100% money-back guaranteed In Stock Now - Ships in 24 hours More Information
Read a LOT more info on their page HERE including outlines of the many reports and findings that you receive when you buy the handbook.
I've had people ask me why I dislike using Internet Explorer so much.
I suppose part of the reason for my distaste is the behaviour Microsoft has displayed in the past and present. Forcing the use of their software has never sat well with me, particularly when the software in question is always so full of holes. Internet Explorer is infamous for bugs, ranging from major security flaws to simple annoyances and everything in between.
I just don't trust software that has always had so many obvious as well as not so obvious problems.
The example below pretty much sums up my lack of faith and explains why I use an alternate browser whenever possible.
I like to install windows updates manually so I can see what I'm installing and only install what I think is absolutely necessary. Every time I do this, I get this same window popping up.
I can’t see a point to having a trusted sites list if I have to confirm this one every single time. It comes pre-installed on the Start Menu for Pete's sake!
Presumably the warning of the potential security risk here is that you are about to expose your system to more Microsoft software. :-D
Written by Scott Van Achte and published at 2:20 PM
I am not sure where I have been for the past two years, but today is the first time I have heard about Windows Live Expo - today is also the day I heard about its scheduled demise.
Apparently this classified ad platform was considered to be a major threat for Craigslist, and this month, on July 31, Microsoft will pull the plug and the classified service launched back in February of 2006.
Considering I work online and spend at least 8 hours a day sitting in front of my computer, for me to have never heard ofWindows Live Expo, is probably a good indicator of why it was ultimately a failure - quite simply, Microsoft just didn’t do a very good job getting the word out.
Last year, Microsoft adCenter introduced changes to their Ad ranking system for advertising in the US. This represented a shift towards a more Quality Based Ranking system, along the lines of what Google AdWords and Yahoo’s Panama already have in place.
It appears that this change is now expanding into more widespread coverage. On the adCenter blog on Wednesday, Microsoft announced the introduction of this system to the Canadian and UK markets.
While it may seem Microsoft is only playing “catch up” with this initiative, they have been busy making other improvements to their system as well.
Though still low on the totem pole for market share, Microsoft is forging ahead with a level of energy that would seem alien over in the Yahoo trenches. With projects such as the Excel addon, Desktop Editor and Analytics, Microsoft may give Google some real competition in the not too distant future.
A scan of the Yahoo blog yields a host of “how to” and “tips and tricks” posts, but nothing particularly significant in the way of badly needed innovation.
In fact the last significant improvement Yahoo has implemented was the change to minimum bids. While that has been handy for getting alerts when minimum bids are about to become too low, it’s not been the sort of improvement that painfully awkward interface really needs.
There is speculation that Microsoft is overtaking Yahoo’s position for paid traffic as well.
While I don’t know if this is indicative of anything, I have noticed a recent decrease in the number of impressions in our own Yahoo accounts. Unfortunately, the bid prices have not decreased as yet.
In a move that chills my bones yesterday George Bush attained congressional approval to make "a massive expansion of the Foreign Intelligence Surveillance Act" (FISA). This security upgrade provides FISA with "the power to order Google, AT&T and Yahoo to forward to the government all e-mails, phone calls and text messages where one party to the conversation is thought to be overseas." Source, Wired Blog Network.
So what does this mean to you? Obviously this is being done to catch terrorists and protect the American people... a noble pursuit without a doubt. Unfortunately, it also cuts off 3 vital methods of once private communication that journalists were able to utilize when researching stories using foreign contacts. After all, there is no way a journalist can feel comfortable connecting with a foreign source if they know the source is unprotected by journalistic confidentiality. As a result, a short time after the bill was passed the American Civil Liberties Union filed a lawsuit today challenging the constitutionality of the bill. More information on that is available in the Wired posting that alerted me to this travesty of privacy.
This all makes me wonder how much more biased the news will become when responsible journalists can't continue to police the government because their sources are too afraid to come forward. I don't know about you but I have always felt that uncensored journalism was one of the linchpins of free speech that made democratic societies possible.
Oh There's More! What About You and I? Journalistic rights are one thing but what about my fellow Canadians that use Gmail or for that matter anyone outside of the USA using Gmail? Since we are foreign users of Gmail are all of our emails subject to inspection? I realize this whole line of questioning begs the question... "why... what do you have to hide?" Well I am sorry but that just doesn't hold water, we all have a right to privacy and this latest erosion of civil rights in America sets an Orwellian precedent and has international, long-term repercussions that just plain frighten me.
What do you have to say about this latest bill? I realize this is hot topic for many... in fact I would usually avoid the subject on our corporate blog but I feel it is just too important to shake off. I would love to hear views from our American and international readers on this.
"I have a number of websites used to market XYZ Travel. The main site features all products and a number of specialist / boutique websites feature duplicates of these products. Is this an acceptable practice or am I being penalized by Google? How can this be overcome?
"I accept that creative write-ups can be rewritten if essential but fact is fact - what can I do about factual data i.e. room numbers, types, facilities, amenities, address, rates, name etc?"
ANSWER: Since duplicate content became a hot issue a few years ago I have been asked this question many times. Fortunately it seems you are in a position that should not yield any problems but before I explain let me first state what I understand from your email so there is no miscommunication.
Your state of affairs: You have one larger site that has all of your products and a few other sites that serve as niche sites specializing in particular products from your main website. You are concerned that some essential data such as travel details may count as duplicate content because they have to be identical across both the main site and the niche site(s).
And Now My Answer: Provided each page with the duplicated product details also has substantial unique content I believe you will have no problem with duplicate content "penalties". And I am sure you have substantial unique content on these niche sites otherwise there would be no point in creating the sites short of spamming the search engines which is always a bad idea.
It is worthy of noting, however, that even if there were an issue of duplicate content Google representatives have clearly noted that pages with significant duplicate content are merely met with disinterest by the search engine algorithm and passed over; so your site is not directly penalized per se.
The only penalty of any significance takes place when there is evidence of serious duplicate content abuse; when there are nearly identical pages across an entire site. This is common when sites are copies of other sites at which point serious penalties (such as removing the site from the index) are without question the best action.
I hope that has helped you. If you or anyone else would like to pose a question to the StepForth team please visit our SEO Questions form and we will get back to you as soon as possible with an answer.
Written by Scott Van Achte and published at 9:59 AM
In today’s race to the top of the Google SERP’s (Search Engine Result Pages), there are a number of factors that can help you achieve those coveted spots. While certain techniques may weigh better than others based on your industry and level of competition, there is no questioning the power of links.
There are several methods, some common and some yet to be discovered, you can try out to help boost your link density and search rankings. While it would be near impossible to go into great detail on all methods (that would require a book) below I have outlined some of the more common techniques a web site owner can use to increase their site’s popularity.
1. Reciprocal Links Reciprocal links used to be a huge asset and played a significant role. Today, fewer sites are employing this technique as it is thought by many to have no role in the eyes of Google. This is simply not true. While the overall value of reciprocal links has declined over the years, they can and will still help your rankings if done correctly.
The key with reciprocal links is very simple - relevance. If you trade links only with highly relevant sites, you will get value from this. There are some things to watch out for: ensure that the links re