Hello
I have a test Drupal web site.
The majority of the content is in slovak language.
I have created a story with title: Test of slovak charakters in Title (ľ,š,č,ť,ž,ý,á,í,é,ú,ä,ô,Ľ,Š,Č,Ť,Ž,Ý,Á,Í,É,Ú)
I have entered this story by going to Administer/Create Content/Story, switching my keyboard to SK - slovak (Slovakia) and typing the slovak characters.
Look at http://www.sk-bc.ca/slovo/?q=node/190
I need to export the Drupal DB.
My ISP does not offer MyPhpAdmin, so I had to write my own php scripts to export MySql table structures and data.
The script:
* extracts data from DB tables, e.g.: select * from node
* generates INSERT INTO statement, e.g.: INSERT INTO node ...
* creates a text file with the INSERT INTO statements
* offers to the client to Open or Save a text file with the INSERT INTO statements
When I extract data from the DB table, every column/value/attribute is processed like this:
$AttrValue = ....taken from the database table ....
$SlashAttrValue = addslashes($AttrValue);
if(mb_detect_encoding($SlashAttrValue)=='UTF-8'){
$Utf8SlashAttrValue = $SlashAttrValue;
}else{
$Utf8SlashAttrValue = utf8_encode($SlashAttrValue);
}
echo ($Utf8SlashAttrValue);
When I generate a text file I use
header("Content-disposition: filename=Backup.data");
header("Content-type: application/octet-stream; charset=utf-8");
header("Pragma: no-cache");
header("Expires: 0");
The file was generated.
When I open it by a text editor (notepad), it shows:
...
INSERT INTO D5_5_slovo_node (nid, vid, type, title, uid, status, created, changed, comment, promote, moderate, sticky, language, tnid, translate) VALUES
...
('190', '199', 'story', 'Test of slovak charakters in Title (?,š,?,?,ž,ý,á,Ã,é,ú,ä,ô,?,Š,?,?,ÂŽ,Ã,Ã,Ã,É,Ú)', '1', '1', '1239465107', '1239465107', '2', '0', '0', '0', '', '0', '0');
When I open it MS Word (Vista), it asks for encoding, I specify Inicode (UTF-8), and it shows:
('190', '199', 'story', 'Test of slovak charakters in Title (?,,?,?,,ý,á,í,é,ú,ä,ô,?,,?,?,,Ý,Á,Í,É,Ú)', '1', '1', '1239465107', '1239465107', '2', '0', '0', '0', '', '0', '0');
Obviously, some slovak characters have been exported correctly and some not.
Do you have any idea why?
Thank you for you time.
Comments
Ahoj Jozo, You should be
Ahoj Jozo,
You should be able to export from mysqldump using:
mysqldump --default-character-set=charset_name
where charset_name is set to utf8 (ISO-8859-2 may also be worth trying).
I don't know whether I would trust Word as the definitive test of whether your database has been successfully exported. Instead, I would try reimporting the data into a new MySQL database, since ultimately that will be the true measure of its usefulness as a backup.
Vdaka Ben Thank you for your
Vdaka Ben
Thank you for your answer.
I do not have control over the mysql server, I cannot use command line commands.
What I can do is only to access MySql DB from their web server (from my root directory) using php
That is the reason why I had to write my own "backup" scripts and download a backup text file to my local machine before importing it to a new MySql DB.
Yes, you are right about Word, however I wanted to eliminate possible problems at importing data to a new MySql DB. What tool should I use to see if the content of the text file is (kind of) ok?
Btw, I also imported the content to a new MySql DB (see http://www.xeuro.web.aplus.net/) and the problem can be seen over there.
So the process consists of 3 steps
* export of the text file with DROP, CREATE and INSERT INTO from ISP1 to my local machine
* manipulating the text file on my local (deleteing non-drupal tables from the text file)
* import of the text file with DROP, CREATE and INSERT INTO to ISP2
It looks like the problem is in my php scripts, mainly in the "header(.... " instructions, but I do not know where
Jozo
Vdaka Ben Thank you for your
Vdaka Ben
Thank you for your answer.
I do not have control over the mysql server, I cannot use command line commands.
What I can do is only to access MySql DB from their web server (from my root directory) using php
That is the reason why I had to write my own "backup" scripts and download a backup text file to my local machine before importing it to a new MySql DB.
Yes, you are right about Word, however I wanted to eliminate possible problems at importing data to a new MySql DB. What tool should I use to see if the content of the text file is (kind of) ok?
Btw, I also imported the content to a new MySql DB (see http://www.xeuro.web.aplus.net/) and the problem can be seen over there.
So the process consists of 3 steps
* export of the text file with DROP, CREATE and INSERT INTO from ISP1 to my local machine
* manipulating the text file on my local (deleteing non-drupal tables from the text file)
* import of the text file with DROP, CREATE and INSERT INTO to ISP2
It looks like the problem is in my php scripts in the 1st step, mainly in the "header(.... " instructions, but I do not know where
Jozo
table collations and backup_migrate module
Hi Jozo
You should probably check that the ISP has created the database tables as being in UTF8_general collation, and not in a latin based collation
Also - try out the http://drupal.org/project/backup_migrate backup and migrate module, which allows you to handle backups from the drupal admin. maybe the script you are using its not handling all cases?
good luck with it,
-M
Thank you, Mackh
Thank you very much, Mackh, for advising the backup_migrate module.
That was exactly what I needed.
I installed it at ISP1, I used the configuration to eliminate unwanted tables and content, downloaded the file, changed the table prefixes and using MyPhpAdmin I imported the tables to ISP2 DB, and it works correctly :)
Thank you again
Ask and set PHP and Database's encodings, don't autodetect
Jozo:
Your site looks very interesting! A Czech site about British Columbia, that is something I like to know about!
I agree with Beanjammin that the ultimate test is what you get when you import the SQL file into your new database. If the characters are intact in the new database, it doesn't matter so much what word processors say about the encoding of the intermediate SQL file. What do you see when you import the SQL file into ISP2's database?
The details of how PHP and your database handle text encoding depend on the version of PHP, the kind of database, and the database software version you use. Which are they?
I'm not sure that it's the best idea to call PHP function mb_detect_encoding() on each string from the old database. It would be a much better idea to know what encoding the source database uses, and set PHP to use UTF-8 unambiguously.
PHP function mysql_client_encoding() tells your PHP code the value of the character_set variable from MySQL. If you are using a different database, there are probably similar functions for them. If all your strings use the same encoding as the database encoding, then you know the encoding of all incoming strings.
[http://docs.php.net/manual/en/function.mb-internal-encoding.php|PHP function mb_internal_encoding] will set the PHP interpreter's internal encoding to UTF-8. It looks like you want your SQL file encoded in UTF-8, which is a wise choice. Thus it helps you to to have PHP use UTF-8 internally. If your database is using a different encoding, PHP will likely automatically convert text to PHP internal encoding (at least, that's my understanding of how PHP 6 will do it).
The way I check the encoding of a text file if I really want to be sure about it, is to open the file with a binary text editor and look at the byte values myself. The UTF-8 values for "ľ,š,č,ť,ž" should be "C4 BE 2C C5 A1 2C C4 8D 2C C5 A5 2C C5 BE" for example.
Does this help?
Please keep me informed with how things go for this site. I will eventually want to include it in my B.C. Polyglot [Web Sites] Directory.
—Jim DeLaHunt, multilingual websites consultant, Vancouver, Canada
—Jim DeLaHunt, multilingual websites consultant, Vancouver, Canada
Small correction, not Czech,
Small correction, not Czech, it is a Slovak site about British Columbia:)
In meantime, based on Mackh's recommendation, I installed the backup_migrate module and used it for exporting the database, I imported it to the new site and it works.
This site is, and will be accessible at www.sk-bc.ca (Slovakia-BritishColumbia.Canada)
Thank you for your recommendations
@Jim I'm having an issue with
@Jim
I'm having an issue with the Backup & Migrate module where odd characters (ex: ’) are getting into the database when I migrated and when I restored the Db from a backup .mysql file. It would seem that the file isn't being saved in utf8 but asii.
I did a test by downloading the backup file and noticed that the de-encoded characters appeared again in the file, but if I opened the file in Notepad and saved as ut8 and then opened it again the characters were properly encoded.
You mentioned setting "the PHP interpreter's internal encoding to UTF-8." - how can this be done?
@joncianciullo Did you manage
@joncianciullo
Did you manage to export the database with the file encoding set to UTF-8? I'm having the same exact issue here. Can you please post the solution here if you have solved it?
Thank you.