Indexation site dynamic en PHP
Bonjour,
Voilà qui devrait aider ceux qui ont besoin de « rewriter » leur url PHP
pour qu’elles soient « crawlable ».
Bonne après-midi
Introduction
When writing scripts, it is extremely important to have to ability to
transfer information from one script to another. A common method to do
this is with the GET convention. Search engine Web spiders, however,
tend to ignore pages whose URL contains GET method parameters. If you’re
not sure what a GET method parameter is, here’s an example of a URL with
GET method parameters:
http://www.zend.com/mypage.php?myval=1&yourvar=2
The example URL passes two parameters to the script mypage.php: « myval »
and « yourvar » with the values 1 and 2 respectively. When a search engine
spider encounters such a URL while indexing your pages, the spider will
ignore the URL and not index that particular page.
This can have a fairly detrimental effect on how your pages are indexed
– especially if you use the GET convention in your hyperlinks often.
Today I’ll show you how to use your Web server to pass parameters to PHP
scripts so that it fools search engines, and allows your page to be
indexed when it would otherwise be ignored.
What’s wrong with the GET method
The GET method of transferring parameters between Web pages is by far
the simplest method. It is particularly useful for passing parameters
from within HREF tags. For example, assume you have a set of articles on
your Web site and a single script that displays the articles in the
desired fashion.
If you wanted to provide a simple hyperlink using to a
particular article, you would need to pass the script a parameter
telling it which article you would like to view using the GET
convention. Unfortunately, Web spiders generally ignore hyperlinks that
include parameters in the URL. This means that the page which the
hyperlink points to – as well as all pages referenced by it – will be
ignored by the Web spider indexing your site.
A spider-friendly GET gimmick
Now that you have a better understanding of the problem, let’s look at
the solution. In order for a spider to traverse (and consequently index)
a given page, the URL must be free of any appearance of parameters. But
if a given page requires parameters to function properly, what can be
done? The answer lies in the use of the $PATH_INFO environment variable,
which you can convert a URL from…
http://www.zend.com/myscript.php?myvalue=Hello
…to a spider-friendly format:
http://www.zend.com/myscript.php/myvalue/Hello
Notice that the spider-friendly format contains no indication that there
are any parameters being passed at all. Rather, it simply looks like we
are trying to access the directory on the zend.com site
/myscript.php/myvalue/Hello, and any search engine spider that accesses
the page won’t have any trouble following the URL. Yet in reality we are
executing the script myscript.php.
But what happened to your parameters?
How to GET your hidden data
Now that you have successfully hidden your parameters within what
appears to be a directory structure, how do you get them out? Whenever a
PHP script is executed with extra path data appended to the end of the
filename (as we did in the spider-friendly example above), the Web
server creates an environment variable $PATH_INFO containing this
information. You can then access this environment variable through PHP
automatically, and parse it to retrieve our data. So our earlier URL…
http://www.zend.com/myscript.php/myvalue/Hello
…would populate the $PATH_INFO variable with:
/myvalue/Hello
…from which you can then parse and retrieve the passed information.
Deciphering your data
Now that you know where your parameters are, the next step is to
decipher them into a format that PHP can use. Although there is no
required method for doing this, I’ll assume that you have formatted your
data in the following way:
/var_name/var_data/var2_name/var2_data/…
Using this method, all that is left is to:
- break the provided string every time we encounter a slash (’/’)
- create variables to associate the given names (var_name,
var2_name, etc.) with their respective values (var_data, var2_data,
etc.)
With all of this in mind, let’s look at some real code.
The script
As with many powerful techniques, the code required to create this
ability in your scripts is not difficult to develop. The process
consists of traversing an array based on the $PATH_INFO, and creating
variables based on that data. In the end, the object is to take the
URL…
http://www.zend.com/myscript.php/myvalue/Hello
…then use the data provided in the $PATH_INFO variable to construct
corresponding variables:
$myvalue = « Hello »
Code flow - Check for the existence of $PATH_INFO
- Split $PATH_INFO into an array
- If the total number of parameters is even, add an extra empty
element at the end to simplify the traversal in the next step - Traverse array and create variables based on the $PATH_INFO data
NOTE: If the $PATH_INFO variable does contain a value (if no parameters
were passed it will not be set), the first element in the $vardata array
will be empty (with the actual data starting at index 1). Therefore, it
is important to take this into account when parsing and populating
variables as we did in the above code.
A step further
In the above script, not only are the assigned values to variables based
on the $PATH_INFO of the script, but also the variable names themselves.
This was done to show parallels between this method of passing
parameters and the GET method. However, in most cases you can assume the
names of the passing parameters.
For example, say you would like to pass a first and last name to the
script through our $PATH_INFO method. Using the code above, the URL
would resemble the following…
http://www.mysite.com/myscript.php/first/John/last/Coggeshall
…to create the variables $first and $last and assign the values « John »
and « Coggeshall » respectively. However, when using the $PATH_INFO
method, you have more flexibility than with a GET method. The same URL
could be written in the following fashion…
http://www.mysite.com/myscript.php/John/Coggeshall
…and then the script could use the following to retrieve the data:
list($dummy, $first, $last) = explode(’/’, $PATH_INFO);
This would allow the script to statically define variables as necessary
for that script. Using this method, the variables $first and $last will
always be created and set to the first and second values separated by a
slash. Note also the third variable $dummy must also be created to deal
with the first slash in $PATH_INFO. This could also be avoided in the
following manner:
list($first, $last) = explode(’/’, substr($PATH_INFO,1));
Final notes
It is important to point out that we are expanding the parameter passing
abilities of PHP, rather than changing them. You can you use this script
to hide parameters you pass to your script, as well as pass parameters
to it with standard GET or POST methods as usual.
Because this script is so transparent, feel free to prepend it to any
script either through the auto_prepend directive or with a simple
include() statement.