|
|
|
Intro
The
goal of search engine optimization (SEO) is to get
higher rankings. This is done using a two-step process.
The first is to use industry information that allows
professional optimizers to understand what these robots
are looking for. This is a complex process because a Web
site's placement within a spider driven search engine is
derived from hundreds of variables such as link
popularity, click popularity, keyword density, Web site
themes and more. The second way SEO professionals
optimize sites is to eliminate/reduce on-site techniques
that can impede the search engine spidering process.
There are 2 types of search engines: Search Engine
Spiders and Search Engine Directories. Each search
engine uses unique criteria for "ranking" sites - this
"ranking" determines their placement in the search
engine results page (SERP). Spider-driven engines use
robots to spider sites on the Internet. Robots "crawl"
each site and "score" pages based on relevancy.
Directories have humans that check the sites - rather
than robots. Some engines score the index page while
others score individual web pages.
The topics mentioned below could either block search
engine spiders form crawling your site accurately or
ultimately get your site penalized in certain engines
and/or directories. Take these into serious
consideration when designing and/or optimizing your
website.
Duplicate Content
Creating duplicate content/mirrors/redirects might be
one of the worst things you could possibly do if you
want to succeed in the search engines. When search
engines were first getting popular, you could simply
point 10 domain names to the same Web site and they
would all stack up on the same page of results for the
same keywords. Meaning, if you ranked well with one
phrase, all 10 of those sites would do the same. This
was a burden to the search engines, so now they use very
sophisticated algorithms to filter out duplicate
content.
They examine all aspects of site structure, image names,
and matching text. When too many of these areas match
another web site it triggers a red flag, and the site is
penalized. Beware of mirror sites, affiliate sites, or
any other "cookie-cutter" web marketing service that
promises big profits with little effort. Today's engines
will remove or reject duplicate content, so this usually
leads to failure.
If you want to thrive on the web make sure your site has
original and unique content. The safest way to get top
search engine placement is to produce real content.
Following are three examples of the use of duplicate
content:
- Mirrored Sites: Mirrored sites occur when
the web site sits in one folder on a particular
server, has one IP address, but 2 different domains
pointing to the same folder. So, when you type in
the two different URLs they both bring up the same
site. This is a horrible scenario for search
engines. Google® sees this as being tricked, rules
it as duplicate content and will penalize the site.
They can choose to only index one of the domains or
not list either of them.
- Cloned Sites: Sites that are cloned sit
in two different folders; have two different IP
addresses and 2 different domain names. This
scenario is generally ok with the search engines as
long as at least 40% of the site content if
different from the other. Again, engines will not
stand for duplicate content and will penalize the
site if they are not different. The Database of
products can stay the same as the long as total look
and feel of each site is different.
- Redirects: Pointing multiple domains to
one site seems to be very popular these days. This
is done by registering multiple domains and
redirecting to one main domain name. Again, this can
cause problems because the engines think you are
trying to trick them by taking them somewhere else.
They treat this just as a clone, consider it
duplicate content, and can penalize the site. If you
must have multiple domains, use a 301 redirect on
all secondary domain names pointing to your main
URL.
Doorways
Many believe doorway pages are an essential aspect of
an effective web site optimization. In an effort to
improve rankings, however, some marketers have spammed
the search engines with doorway pages, generating
multiple pages with little information, making it a
topic of much controversy. Search engines have responded
to this practice, and are now much stricter in their
rules and requirements. Filters have been created to
block the "spammers rendition" of doorway pages.
A doorway page, or gateway page, is an alternate
entryway to a web site created in the interest of
obtaining a top ranking on particular keyword phrases in
a major search engine. Doorway pages are often hosted at
a different location than the original site. In other
words, a new domain name is registered (usually one that
includes keyword phrases) and the doorway page is
created on that domain name, with links to a destination
page on another web site. Typically, these pages match
the look and feel of the original site. You should avoid
registering a large number of domains with this tactic
because it could be considered spam by the search
engines and could get your site penalized.
Frames
Frames present some great possibilities to us from a
design standpoint, but they should be avoided if at all
possible when it comes to getting your site listed in
the search engines. Many spider-based engines cannot
crawl through them, and specific coding is necessary to
make them readable by the engines. This coding is viewed
as spam amongst most of the search engines. Spiders want
to be able to read and view everything that the visitor
of the site can. However, if your web site does use
frames make sure that you take advantage of the content
area on your web site that doesn't utilize frames. It's
a very powerful section of the site, and if used
properly it can result in some excellent rankings.
Nevertheless, frames do pose unique problems and spiders
cannot read them. The good news is that despite many of
the limitations frames pose, many frameset 'issues' can
be turned into frameset 'positives.'
So, if you are going to use frames for search engine
optimization make sure that you use them wisely. You can
still create a pleasing interface on a two-frame set by
specifying the dimensions of your top or bottom frame as
5 to 8 pixels or 5 to 8%. That should help you avoid the
spam filters.
Cloaking
Cloaking, also known as spoofing, is a method of web
page delivery where different pages are served from the
same address, no matter if the visitor is a human or a
spider. In other words, browsers such as Internet
Explorer are served one page, and spiders visiting the
same address are served a different page, usually an
optimized page. There are two methods of delivering
cloaked pages - IP address and Agent name.
There are two reasons people use cloaking techniques.
- Since viewers never see the page that is
viewable to the spider, the code cannot be stolen.
In highly competitive markets the ability to conceal
code from the competition can be an extremely
powerful advantage.
- Since a human visitor never sees the page that
is served to the spider, the spider page does not
have to be aesthetically pleasing. As a result, it
can be optimized with every trick in the book.
By using cloaking, nobody sees the page except for
the spider. That gives cloaked pages an extremely
powerful advantage over web pages that were optimized to
accommodate a professionally appealing design.
But, cloaking may be one of the most frowned upon
techniques among all engines. Filters will pick up pages
like the following in no time:
- IP Address Delivery:
An Internet Protocol Address is a numeric address,
which identifies your connection and presence on the
Internet. In addition to sites having IP addresses,
so do the spiders.
Since you can 'sniff' for the IP address when
someone visits your site, you can use this data to
push specific pages to the spiders. This method is
more complicated than Agent Name Delivery because it
requires you to maintain a never-ending list of IP
addresses and IP addresses change all the time with
the addition of new ones. The advantage to IP
Address Delivery is that someone cannot steal or
mimic your IP address, making it impossible for
anyone to see the code that is presented to the
spider.
- Agent Name:
Delivering a specific page based on agent name is a
rather simple, but risky task. You simply utilize
some code that says to basically take the visitor
one place and the spider to another. While very
effective, agent delivery is not a foolproof way to
hide your code. Someone can easily use an
agent-faking program to report his or her agent name
as that of a spider when visiting your page. They
will then see exactly what is being served on each
page.
Also, some browsers offer the user a choice of User
Agent variables to submit to any web site they
visit. Consequently, someone might spoof a search
engine spider's User Agent variable to detect
whether you are using cloaked pages. Whatever the
case, any time you use cloaking you take the chance
of being labeled a spammer, a very good chance to
say the least.
IP cloaking is abusive in how it attempts to
manipulate a search engine's index. Since IP cloaking is
deceptive, search engines routinely purge IP cloaked
pages and in some cases ban these web sites permanently.
Link Farms
Since so many engines use link popularity as an
integral part of their ranking algorithms, many
webmasters responded by joining link farms and stuffing
their sites and others with as many links as possible.
But, all links are not good links. In fact, bad linking
strategies may get you banned from some engines.
A link farm is a network of web pages, which are heavily
cross-linked with each other for the sole purpose of
increasing link popularity. The web pages usually are in
more than one domain or in more than one server. When a
web site joins a link farm, it gets a link from each of
these pages and in turn it also has to link back to each
of those pages. This will then affect the link
popularity of the site. But search engines definitely
detect the link farms as well as the web sites
participating in the link farms. Google®, for one,
disapproves of link farms and labels the links they
generate as spam. In fact, some sites get removed from
the index altogether if they are affiliated with link
farms or link stuffing.
Because of this, some webmasters have chosen to remove
all links going out to other sites. That is an
overreaction that decreases the site value to visitors
and hurts the Web in general because cross-linking is a
basic tenet of the Internet. Links are fine - even
encouraged - if they are related to your topic, but link
farms rarely provide useful content to visitors. If your
site is selling cars, linking to car parts sites, car
forums and other car related sites, is very safe and
encouraged. You are only providing access to other sites
that are of interest to your visitors. But, if you
signed up with a service that promises to generate five
hundred inbound links to your site only if you agree to
add two hundred outbound links in return, then you are
likely participating in a link farm.
Instead of linking to related information of value to
your visitors, you are sending them to sites with
non-relevant and useless information. Search engines
will not penalize you for good, relevant links, but are
quick to punish sites that try to spam them with
unrelated links.
Spider Design Blocks
Despite the best efforts to make your site look
unique and attractive, some of the web's most prized
design technology can be a major stumbling block for a
search engine spider.
Flash Sites (or flash introductions) - while beautiful,
cannot be read by a spider. Your solution options are to
use an entrance page that is keyword text phrase
intense, create a two frame frameset where one frame is
only one pixel high and use the No Frames area, or to
alternate the use of Flash and static HTML. Following
are design attributes that block spiders:
- Frames - despite the unique design and
product capabilities they present, can be a major
problem for search engine spiders. Many spiders
cannot read them at all. The quick solution is to
utilize your No Frames content to optimize your page
or stay away from them altogether.
- Image Maps - are something that can
possibly pose a problem with some engines. If you
plan to use an image map, make sure there are other
links on the page (perhaps on the bottom), that link
to your other pages.
- Password Protected Pages - are pages you
probably do not want in the engines anyway. Just be
notified, that like a human, the spider cannot enter
any area that is protected by a password.
- PDF Files - usually provided by Adobe
Acrobat Reader, present a major stumbling block to
most spiders. Some engines (specifically Google®)
are beginning to index these kinds of pages, but
from an optimization perspective this is one format
you want to avoid.
Search Engine Spamming
Search engine spamming is the use of unethical
techniques for improving the position of a Web site in a
search engine. In order to improve their position in a
search engine, some Web site owners use spamming
(unethical techniques) and in turn try to fool the
search engines.
Each search engine's objective is to produce the most
relevant results to its visitors. Producing the most
relevant results for any particular search query is the
determining factor of being a popular search engine.
Every search engine measures relevancy according to its
own algorithm, thereby producing a different set of
results. Search engine spam occurs if anybody tries to
artificially influence a search engine's basis of
calculating relevancy.
The following techniques can be considered spamming:
- Code swapping ("bait & switch")
This means optimizing a page for high search engine
position, and then swapping another page in its
place once a top rank is achieved. This technique
will not lead to a long-lasting search engine
placement because filters have been implemented
across the board to detect this.
- Content Spam
With the help of this spam technique only the search
engines can view a particular part of the data in a
web resource. Some commonly used content spam
techniques are as follows:
Invisible text - Hiding keywords within the
background by using exact or similar font colors is
one of the most common search engine spam techniques
to date. This can be done by using tables or a
background with a different color other than the
real background for the site.
Keyword stuffing - Another very popular
search engine spam trick, used along with hidden
text, is the repetition of keywords on the bottom of
the page in very small fonts. Since the font is
hidden, keywords are crammed into a section of the
site with the intent of capturing the spider's
attention.
- E-mail spamming
E-mail spamming means sending commercial messages to
email addresses from unwanted and unknown sources.
These messages can include, but are not limited too,
chain emails, get-rich scheme messages and messages
that contain adult related material.
There are various ways of collecting email
addresses. The easiest is to collect them from
newsgroups. Newsgroups are a great source of
information, but spammers collect email addresses
out of the posted articles in the newsgroups with
the help of special software. Email spamming can be
used to generate links and solicit search engine
submission services.
- Meta spam
In order to manipulate a search engine's
relevancy algorithms, meta data can be used as a web
resource inaccurately or incoherently. Following are
the common Meta spam techniques:
Unrelated keywords - In order to fool
crawlers it has become a common technique to use
popular keywords that are not relevant to the site's
content. For the time being, one may be able to
trick a few people searching for such words into
clicking at the link, but soon they will leave the
site when they receive irrelevant information on the
topic they were originally searching for. This kind
of search engine spam upsets both the search engines
and their users.
Hidden tags - The use of keywords in hidden
HTML tags, for the most part, are considered spam to
most search engines and will warrant penalization.
These tags can include, but are not limited to:
title, meta description, http-equivalent, comment,
style, hidden value, font, alt, author, option and
no frame.
- No content
If sites do not contain any unique and relevant
content to offer visitors, search engines can
consider this spam. On that note, illegal content,
duplicate content and sites consisting of large
affiliate links are also considered to be of low
value to search engine relevancy.
- Over submitting
Each search engine has its own regulations on how
many pages can be submitted at a time and how often
it can be submitted. Submitting the same page more
than once a month to the same search engine and
submitting too many pages each day is not allowed.
Overview
Search engines strive to provide the most relevant
results to their users, but spam swamps their indexes
with irrelevant and misleading information. Therefore,
it is advisable to make no mistakes and stay clear of
anything that could be seen as spam by the engines.
Instead, focus on an ethical approach to SEO. Search
engines will always react to the spam techniques when
they become a big enough issue and they are affecting
searchers. Banning is a last resort but has definitely
been known to happen.
The following list will give you an idea of the basic
"DONTS" for the search engines:
- Do not use text that is the same or slightly
different color as the background to 'hide'
keywords.
- Do not repeat the keywords in the Meta tags (use
them only once), and do not use keywords that are
unrelated to the site's content.
- Do not create a title like "web hosting, web
hosting, web hosting." This is considered spam.
- Do not repeat the keyword to increase its
frequency on a page (Keyword stuffing). Search
engines now have the ability to detect this: they
can spider a page and determine whether the
frequency is above a "normal" level in proportion to
the rest of the words in the document - this is also
known as keyword density.
- Do not optimize a page for top ranking, and then
swap another page in its place once a top ranking is
achieved.
- Do not put misleading words on the page in the
hopes of attracting visitors looking for another
topic.
- Do not submit a page to the search engines that,
once loaded, automatically redirects to a page of
different information.
- Do not create a page that prohibits the user
from using the browser's back button to return to
the search engine results.
- Do not create "doorway pages."
- Do not submit the same page more than once on
the same day to the same search engine.
- Do not put multiple instances of the Title Tag
in the HTML code.
- Do not put pages of content in layers and
position them off screen or practice the same kind
of behavior by turning the visibility of the layers
off.
- Do not use small or 'invisible text' in the
page.
- Do not send query to a search engine with an
automated 'rank reporting tool' hundreds of times
per day.
- Do not purchase multiple domains and put
duplicate copies of the web site on each domain.
- Do not participate in Link Farm programs.
- Do not submit different versions of the web site
in the hope of getting multiple listings.
- Do not submit more than the allowed number of
pages per engine per day or week. Each engine has a
limit on how many pages one can manually submit to
it using its online forms.
- Do not cloak.
- Do not support affiliate sites with the same or
similar content but different site designs.
- Do not create a page that is stuffed with
keyword content so far down the page that it is
unlikely anyone will ever scroll down that far.
- Do not create a plain page specifically designed
to rank highly, and then once indexed, upload a
different page to your server.
- Do not put hundreds of 1x1 transparent .gif's on
your page and assign them all the same ALT text.
This is rather easy to detect.
- Do not use CSS to set the text size of a
particular tag to 0% and then fill your page with
'invisible text.'
As you can see, there are a lot of ways to fool the
search engines, but just about all of them are
detectable - and that makes them very dangerous.
If you are serious about custom delivery to the
engines, there is really only one way to go - and that
is with a professional search engine optimization .

|