BigTrafficGuide.png

Big Traffic Guide is ONLY 5 Dollars !
Awsome value.


Comodo SSL Expand
Collapse

kimbersoft.com Study

globeweb.png

Sites and Spiders

Objective   3/23/2016

To provide documentation on Sites and Spiders.

Website   3/23/2016

Wikipedia

A website, also written as web site, or simply site, is a set of related web pages typically served from a single web domain.

A website is hosted on at least one web server, accessible via a network such as the Internet or a private local area network through an Internet address known as a uniform resource locator (URL).

All publicly accessible websites collectively constitute the World Wide Web.

Web pages, which are the building blocks of websites, are documents, typically written in plain text interspersed with formatting instructions of Hypertext Markup Language (HTML, XHTML).

They may incorporate elements from other websites with suitable markup anchors.

Web pages are accessed and transported with the Hypertext Transfer Protocol (HTTP), which may optionally employ encryption (HTTP Secure, HTTPS) to provide security and privacy for the user of the web page content.

The user's application, often a web browser, renders the page content according to its HTML markup instructions onto a display terminal.

The pages of a website can usually be accessed from a simple Uniform Resource Locator (URL) called the web address.

The URLs of the pages organize them into a hierarchy, although hyperlinking between them conveys the reader's perceived site structure and guides the reader's navigation of the site which generally includes a home page with most of the links to the site's web content, and a supplementary about, contact and link page.

Some websites require a subscription to access some or all of their content.

Examples of subscription websites include many business sites, parts of news websites, academic journal websites, gaming websites, file-sharing websites, message boards, web-based email, social networking websites, websites providing real-time stock market data, and websites providing various other services (e.g., websites offering storing and/or sharing of images, files and so forth).

Web crawler (Spider)   3/23/2016

Wikipedia

A Web crawler is an Internet bot which systematically browses the World Wide Web, typically for the purpose of Web indexing.

Web search engines and some other sites use Web crawling or spidering software to update their web content or indexes of others sites' web content.

Web crawlers can copy all the pages they visit for later processing by a search engine which indexes the downloaded pages so the users can search much more efficiently.

Crawlers consume resources on the systems they visit and often visit sites without tacit approval.

Issues of schedule, load, and "politeness" come into play when large collections of pages are accessed.

Mechanisms exist for public sites not wishing to be crawled to make this known to the crawling agent.

As the number of pages on the internet is extremely large, even the largest crawlers fall short of making a complete index.

Crawlers can validate hyperlinks and HTML code.

They can also be used for web scraping (see also data-driven programming).

A Web crawler may also be called a Web spider, an ant, an automatic indexer, or (in the FOAF software context) a Web scutter.

Overview

A Web crawler starts with a list of URLs to visit, called the seeds.

As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier.

URLs from the frontier are recursively visited according to a set of policies.

If the crawler is performing archiving of websites it copies and saves the information as it goes.

The archives are usually stored in such a way they can be viewed, read and navigated as they were on the live web, but are preserved as ‘snapshots'.

The large volume implies the crawler can only download a limited number of the Web pages within a given time, so it needs to prioritize its downloads.

The high rate of change can imply the pages might have already been updated or even deleted.

The number of possible URLs crawled being generated by server-side software has also made it difficult for web crawlers to avoid retrieving duplicate content.

Endless combinations of HTTP GET (URL-based) parameters exist, of which only a small selection will actually return unique content.

For example, a simple online photo gallery may offer three options to users, as specified through HTTP GET parameters in the URL.

If there exist four ways to sort images, three choices of thumbnail size, two file formats, and an option to disable user-provided content, then the same set of content can be accessed with 48 different URLs, all of which may be linked on the site.

This mathematical combination creates a problem for crawlers, as they must sort through endless combinations of relatively minor scripted changes in order to retrieve unique content.

As Edwards et al. noted, "Given that the bandwidth for conducting crawls is neither infinite nor free, it is becoming essential to crawl the Web in not only a scalable, but efficient way, if some reasonable measure of quality or freshness is to be maintained."

A crawler must carefully choose at each step which pages to visit next.

Google Search Console   3/23/2016

Wikipedia   Google Search Console

(previously Google Webmaster Tools) is a no-charge web service by Google for webmasters. It allows webmasters to check indexing status and optimize visibility of their websites. As of May 20, 2015, Google rebranded Google Webmaster Tools as Google Search Console.[1][2] It has tools that let webmasters:

  1. Submit and check a sitemap.
  2. Check and set the crawl rate, and view statistics about when Googlebot accesses a particular site.
  3. Write and check a robots.txt file to help discover pages that are blocked in robots.txt accidentally.
  4. List internal and external pages that link to the site.
  5. Get a list of links which Googlebot had difficulty crawling, including the error that Googlebot received when accessing the URLs in question.
  6. See what keyword searches on Google led to the site being listed in the SERPs, and the click through rates of such listings.
    (Previously named 'Search Queries'; rebranded May 20, 2015 to 'Search Analytics' with extended filter possibilities for devices, search types and date periods).
  7. Set a preferred domain (e.g. prefer example.com over www.example.com or vice versa), which determines how the site URL is displayed in SERPs.
  8. Highlight to Google Search elements of structured data which are used to enrich search hit entries (released in December 2012 as Google Data Highlighter).
  9. Demote Sitelinks for certain search results.
  10. Receive notifications from Google for manual penalties.
  11. Provide access to an API to add, change and delete listings and list crawl errors.

Google Trends   3/23/2016

Wikipedia   Google Trends

is a public web facility of Google Inc., based on Google Search, that shows how often a particular search-term is entered relative to the total search-volume across various regions of the world, and in various languages.

The horizontal axis of the main graph represents time (starting from 2004), and the vertical is how often a term is searched for relative to the total number of searches, globally.

Below the main graph, popularity is broken down by countries, regions, cities and language.

Note that what Google calls "language", however, does not display the relative results of searches in different languages for the same term(s).

It only displays the relative combined search volumes from all countries that share a particular language (see "flowers" vs "fleurs").

It is possible to refine the main graph by region and time period.

On August 5, 2008, Google launched Google Insights for Search, a sophisticated and advanced service displaying search trends data.

On November 29, 2012, Google merged Google Insights for Search into Google Trends.

Google Trends is a public web facility of Google Inc., based on Google Search, that shows how often a particular search-term is entered relative to the total search-volume across various regions of the world, and in various languages.

The horizontal axis of the main graph represents time (starting from 2004), and the vertical is how often a term is searched for relative to the total number of searches, globally.

Below the main graph, popularity is broken down by countries, regions, cities and language.

Note that what Google calls "language", however, does not display the relative results of searches in different languages for the same term(s).

It only displays the relative combined search volumes from all countries that share a particular language (see "flowers" vs "fleurs").

It is possible to refine the main graph by region and time period.

On August 5, 2008, Google launched Google Insights for Search, a more sophisticated and advanced service displaying search trends data.

On November 29, 2012, Google merged Google Insights for Search into Google Trends.

Googlebot is the search bot software used by Google   3/23/2016

Wikipedia   Googlebot

is the search bot software used by Google, which collects documents from the web to build a searchable index for the Google Search engine.

If a webmaster wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in a robots.txt file, or by adding the meta tag meta name="Googlebot" content="nofollow" to the web page.[1] Googlebot requests to Web servers are identifiable by a user-agent string containing "Googlebot" and a host address containing "googlebot.com".[2]

Currently, Googlebot follows HREF links and SRC links.[1] There is increasing evidence Googlebot can execute JavaScript and parse content generated by Ajax calls as well.[3][4] There are many theories regarding how advanced Googlebot's ability is to process JavaScript, with opinions ranging from minimal ability derived from custom interpreters.[5][6][7] Googlebot discovers pages by harvesting all the links on every page it finds.

It then follows these links to other web pages.

New web pages must be linked to from other known pages on the web in order to be crawled and indexed or manually submitted by the webmaster.

A problem which webmasters have often noted with the Googlebot is that it takes up an enormous amount of bandwidth.[citation needed] This can cause websites to exceed their bandwidth limit and be taken down temporarily.

This is especially troublesome for mirror sites which host many gigabytes of data.

Google provides "Webmaster Tools" that allow website owners to throttle the crawl rate.

Internet bot, also known as web robot   3/23/2016

Wikipedia

An Internet bot, also known as web robot, WWW robot or simply bot, is a software application that runs automated tasks (scripts) over the Internet.

Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human alone.

The largest use of bots is in web spidering, in which an automated script fetches, analyzes and files information from web servers at many times the speed of a human.

Given the exceptional speed with which bots can perform their relatively simple routines, bots may also be implemented where a response speed faster than that of humans is required.

Common examples including gaming bots, whereby a player achieves a significant advantage by implementing some repetitive routine with the use of a bot rather than manually, or auction-site robots, where last-minute bid-placing speed may determine who places the winning bid – using a bot to place counterbids affords a significant advantage over bids placed manually.

Bots are routinely used on the internet where the emulation of human activity is required, for example chat bots.

A simple question and answer exchange online may appear to be with another person, when in fact it is simply with a bot.

While bots are often used to simply automate a repetitive online interaction, their ability to mimic actual human conversation and avoid detection has resulted in the use of bots as tools of covert manipulation.

On the internet today bots are used to artificially alter, disrupt or even silence legitimate online conversations.

Bots are sometimes implemented, for example, to overwhelm the discussion of some topic which the bot's creator wishes to silence.

The bot may achieve this by drowning out a legitimate conversation with repetitive bot-placed posts which may in some cases appear to be reasonable and relevant, in others simply unrelated or nonsense chatter, or alternatively by overwhelming the target website's server with constant, repetitive, pointless bot-placed posts.

These bots play an important role in modifying, confusing and silencing conversations about, and the dissemination of, real information regarding sensitive events around the world.

The success of bots may be largely due to the very real difficulty in identifying the difference between an online interaction with a bot versus a live human.

Given that bots are relatively simple to create and implement, they are a very powerful tool with the potential to influence every segment of the World Wide Web.

Efforts by servers hosting websites to counteract bots vary.

Servers may choose to outline rules on the behaviour of internet bots by implementing a robots.txt file: this file is simply text stating the rules governing a bot's behaviour on that server.

Any bot interacting with (or 'spidering') any server that does not follow these rules should, in theory, be denied access to, or removed from, the affected website.

If the only rule implementation by a server is a posted text file with no associated program/software/app, then adhering to those rules is entirely voluntary – in reality there is no way to enforce those rules, or even to ensure that a bot's creator or implementer acknowledges, or even reads, the robots.txt file contents.

Personal web pages   3/23/2016

Wikipedia

Personal web pages are World Wide Web pages created by an individual to contain content of a personal nature rather than on behalf of a company, organization or institution.

Personal web pages are primarily used for informative or entertainment purposes but can also be used for personal career marketing, social networking, or personal expression.

Also often used interchangeably with the term "personal web page" are the terms; "personal web site", "personal home page", or most commonly just "home page".

These terms do not usually refer to just a single "page" or HTML file, but to a collection of pages and related files under a common URL or Web address.

In strictly technical terms, a site's actual home page (index page) often only contains sparse content with some interesting or catchy introductory material and serves mostly as a pointer or table of contents to the content-rich pages inside, such as résumés, family, hobbies, family genealogy, a blog, opinions, online journals and diaries or other writing, work, sound clips, movies, photos, or other interests.

Many personal pages only include information of interest to friends and family of the author but some can be valuable topical web directories.

Motivations

Many people maintain personal web pages as a medium to express opinions or creative endeavors that otherwise would not have an outlet.

They also provide a link from the world to the individual along the lines of a telephone book listing.

For those not well-versed in HTML and other Web technologies, personal accounts with social networking services may be faster to set up for creating a simple personal Web presence (due in part to the communal nature of social networks), provided that the page's author does not object to the network's online advertising and in some cases exclusion of readers who do not wish to open an account.

Institutions such as universities often provide home page facilities to their members which are both advertisement-free and world-readable without registration, although the content might be subject to institutional rules.

A personal web page can be used for self promotion, to provide quick access to information, or just as something "cool".

A personal web page gives the owner generally more control on presence in search results and how he/she wishes to be viewed online.

It also allows more freedom in types and quantity of content than a social network profile offers, and can link various social media profiles with each other.

It can be used to correct the record on something, or clear up potential confusion between you and someone with the same name.

Early personal web pages were often called "home pages" and were intended to be set as a default page in a web browser's preferences, usually by their owner.

These pages would often contain links, todo lists, and other information their author found useful.

In the days when search engines were in their infancy, these pages (and the links they contained) could be an important resource in navigating the web.

Website builder   3/24/2016

Wikipedia, the free encyclopedia

Website builders are tools that allow the construction of websites without manual code editing.

They fall into two categories:

One online proprietary tools provided by web hosting companies.

These are typically intended for users to build their private site.

Some companies allow the site owner to install alternative tools (commercial or open source) - the more complex of these may also be described as Content Management Systems;

Two software which runs on a computer, creating pages offline and which can then publish these pages on any host.
(These are often considered to be "website design software" rather than "website builders".)

Weebly   3/25/2016

Wikipedia, the free encyclopedia

Weebly's page editor

Weebly is a web-hosting service featuring a drag-and-drop website builder.

As of August 2012, Weebly hosts over 20 million sites with a monthly rate of over 1 million unique visitors.

The company is headquartered in San Francisco.

The company was founded by chief executive officer (CEO) David Rusenko, chief technology officer (CTO) Chris Fanini, and chief operating officer (COO) Dan Veltri.

The startup competes with Wix.com, Webs, WordPress.com, Squarespace.com, Jimdo, Yola, SnapPages, and other web-hosting and creation websites.

manifestation

Alexa.png dmoz.png

kimbersoft.com is hosted on a re-seller Virtual Private Server

This page was last updated April 30th, 2017 by kim

Where wealth like fruit on precipices grew.   

SEO Links   .   maxbounty.com   .   Inspired Spirit Coaching Academy   .   seoclerks.com

onewaytomakemoneyonline.com    40billion.online

kimbersoft.com YouTube.png kimbersoft.com google+.png kimbersoft.com Twitter