Another article about the indexing of ajax sites by search engines

ноября 14, 2017

Stylish, fashionable youth of today to do a site on AJAX, from the user's point of view — it is quick and easy, and search engines with such sites can be problems.

The best solution is to use normal links, but to load the content via ajax, while leaving the ability to receive content via a normal link for users of c JS is disabled (you never know) and robots. That is inscale need to develop the old fashioned way, with the usual links layout and view-shkami, then you can process all the links javascript, to hang them content loading via ajax using a link from the attribute href of the tag a, in a very simplified form, it should look like this:

the

$(document).on('click', 'a.ajaxlinks', 'function(e) {
e.the stoppropagation();
e.preventDefault();

var pageurl = $(this).attr('href');

$.ajax({
url: pageurl,
data: {
ajax: 1
},
success: function( resp ) {
$('#content').html(resp);
}
});
});

Here, we simply loaded the same page via ajax in the backend need to handle special GET parameter ajax and if any give page without the layout, well, if roughly.

But the architecture it is targeted to the same sites on angularjs, and the like, work more difficult, and put the content loaded in the html template with variables. For such sites (or you can already call them applications), search engines invented the technology HashBang, in short, is a link of the form example.com/#!/cats/grumpy-cat, when the search robot sees #! he makes a request to the server at example.com/?_escaped_fragment_=/cats/grumpy-cati.e. replace "#!" with "?_escaped_fragment_=", the server must give the generated html to a search engine, identical to that seen at the original link, the user. But if the app uses the HTML5 History API, and are not references of the form #!, you need to add to the head section of the special meta tag:
the

<meta name="fragment" content="!" />

At the sight of this tag, the search engine will understand that the website runs on ajax and will redirect all requests for content of the website on the link: example.com/?_escaped_fragment_=/cats/grumpy-cat instead example.com/cats/grumpy-cat.

You can handle these queries by used framework, but in a complex app with angularjs is a bunch of extra code.

The way in which we will go are described in the following diagram, from Google:

To do this, we will catch all requests with _escaped_fragment_ and send them to phantom.js on the server, which means server-side webkit will generate the html snapshot of the requested page and give it to the crawler. Users will just have to work on the line.

To get started, install the necessary software if not installed yet, like this:
the

yum install screen
phantomjs npm instamm
ln-s /usr/local/node_modules/phantomjs/lib/phantom/bin/phantomjs /usr/local/bin/phantomjs

Next, write (or take ready) a server-side js script (server.js) that will do the html-casts:
the

var system = require('system');

if (system.args.length < 3) {
console.log("Missing arguments.");
phantom.exit();
}

var server = require('webserver').create();
var port = parseInt(system.args[1]);
var urlPrefix = system.args[2];

var parse_qs = function(s) {
var queryString = {};
var a = document.createElement("a");
a.href = s;
a.search.replace(
new RegExp("([^?=&]+)(=([^&]*))?", "g"),
function($0, $1, $2, $3) { queryString[$1] = $3; }
);
return queryString;
};

var renderHtml = function(url, cb) {
var page = require('webpage').create();
page.settings.loadImages = false;
page.settings.localToRemoteUrlAccessEnabled = true;
page.onCallback = function() {
cb(page.content);
page.close();
};
// page.onConsoleMessage = function(msg, lineNum, sourceId) {
// console.log('CONSOLE:' + msg + ' (from line #' + lineNum + ' in "' + sourceId + '")');
// };
page.onInitialized = function() {
page.evaluate(function() {
setTimeout(function() {
window.callPhantom();
}, 10000);
});
};
page.open(url);
};

server.listen(port, function (request, response) {
var route = parse_qs(request.url)._escaped_fragment_;
// var url = urlPrefix
// + '/' + request.url.slice(1, request.url.indexOf('?'))
// + (route ? decodeURIComponent(route) : ");

var url = urlPrefix + '/' + request.url;

renderHtml(url, function(html) {
response.statusCode = 200;
response.write(html);
response.close();
});
});

console.log('Listening on' + port + '...');
console.log('Press Ctrl+C to stop.');

And start it in screenshot phantomjs:
the

screen -d-m phantomjs --disk-cache=no server.js 8888 http://example.com

Next, let's configure nginx (similar to apache) to proxy requests running daemon:
the

server {
...

if ($args ~ "_escaped_fragment_=(.+)") {
set $real_url $1;
rewrite ^ /crawler$real_url;
}

location ^~ /crawler {
proxy_pass http://127.0.0.1:8888/$real_url;
}

...
}

Now when you request example.com/cats/grumpy-cat search engines to use the link example.com/?_escaped_fragment_=cats/grumpy-catthat interception nginx-Ohm, go phantomjs, which is on the server through a browser-based engine will generate the html and give it to the robot.

In addition to search engines Google, Yandex and Bing, it will also work for sharing links via facebook.

Links:
https://developers.google.com/webmasters/ajax-crawling/docs/getting-started
https://help.yandex.ru/webmaster/robot-workings/ajax-indexing.xml

UPD (2.12.16):
Configs for apache2 from kot-ezhva:

In case if you are using html5mode:
the

RewriteEngine on
RewriteCond %{QUERY_STRING} (.*)_escaped_fragment_=
RewriteRule ^(.*) 127.0.0.1:8888/$1 [P]
ProxyPassReverse / 127.0.0.1:8888/

If the URLs with the grille:
the

RewriteEngine on
RewriteCond %{QUERY_STRING} _escaped_fragment_=(.*)
RewriteRule ^(.*) 127.0.0.1:8888/$1 [P]
ProxyPassReverse / 127.0.0.1:8888/

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express

Another article about the indexing of ajax sites by search engines

Комментарии

Отправить комментарий

Популярные сообщения из этого блога

A Bunch Of MODx Revolution + LiveStreet

Monitoring PostgreSQL with Zabbix

MODX Revolution meets Fenom