Posted by dkeays on July 20, 2010 at 8:21pm
How are people making backups on MySQL beyond mysqladmin export, mysqldump, or export/migrate?
Can binary logs be told in the mysqld.cfg file to ignore tables?
How are people making backups on MySQL beyond mysqladmin export, mysqldump, or export/migrate?
Can binary logs be told in the mysqld.cfg file to ignore tables?
Comments
I like the Backup & Migrate
I like the Backup & Migrate module
yep so do I. But I don't want
yep so do I. But I don't want to be doing that daily which the customer may need.
independent developer
You can set a schedule so
You can set a schedule so that it will backup automatically
I don't believe you can
From the documentation I've seen on MySQL, I don't believe there is a way to ignore tables for binary logging (you can, however, ignore databases). If you cannot ignore tables, then I would recommend creating a secondary database which has the tables you wish to ignore, and prefix your settings.php file to use that database for any of the those tables. That should solve your issues.
Are you planning on doing manual backups or via a cron task?
Generally, I've used a mysqldump to backup my databases. I wrote a script that will backup each table in the database to a separate sql file (so I can restore certain pieces if necessary and ignore others). I've also found success with using maatkit (http://www.maatkit.org/) and its backup procedure (though maatkit does a LOT more than just backups), especially on very large databases.
Completely forgot about it, but backup and migrate is also an excellent option (I am unsure on what happens to performance on very large sites, however).
Cron jobs. monthly mysqldump,
Cron jobs. monthly mysqldump, weekly mysqldiff, and daily binary logs.
I've set mySQLdump to ignore all cache and sessions in the cfg file and thought it would be nice to do that in binary logs. I still have to learn more about the binary logs and restoring from them.
independent developer
Is there a reason for not doing a nightly mysqldump?
You could probably set up a script to roll your mysqldumps on a monthly basis so they don't take up a lot of room. Keep in mind that mysql binary logs can get quite large if you don't set it up correctly. I know a few others here go even further and do mysql backups every few hours. You can then keep your binary logs for a shorter amount of time.
Here is a good article on point-in-time recovery: http://dev.mysql.com/tech-resources/articles/point_in_time_recovery.html
No, there isn't any reason
No, there isn't any reason other than I was thinking wrong. I might need more if this site takes off (which we plan on happening).
independent developer
EC2
I'm an Amazon EC2 user, and I adapted this for a lot of my backup needs:
Thread: http://alestic.com/2009/09/ec2-consistent-snapshot
Code: https://launchpad.net/ec2-consistent-snapshot
It flushes the database, does a lock, syncs the filesystem, does an XFS freeze, and takes a snapshot of the entire filesystem. It all happens almost instantly. Really great stuff if you are on Amazon. I also periodically do a mysqldump and back it up to S3 for archival.
The client seems to be weary
The client seems to be weary of putting backups on the cloud. I can see the problem if we were stored CCNs. Any comments?
independent developer
Are you sure you want to
Are you sure you want to store credit card numbers?
It's not too difficult to use GPG or zip encryption to protect database backups. I'll show how to do this with Virtualmin at this year's DrupalCamp LA: http://2010.drupalcampla.com/node/261
storing = loco
There are financial companies that can do the storage of CC info for you. I do not have their names but have heard of them.
Keeping CC info on file affects various PCI compliance levels, and almost feels dirty.
Chris Charlton, Author & Drupal Community Leader, Enterprise Level Consultant
I teach you how to build Drupal Themes http://tinyurl.com/theme-drupal and provide add-on software at http://xtnd.us
I meant to say "IF I stored
I meant to say "IF I stored CCNs", but I don't dirty myself now.
@Cris C, are you talking about gateways like Authorize.net?
@Cris S, I'm not sure that would lull my client, but I need to learn it anyways.
independent developer
Storing partial ccn's? drushrc.php ignore list of tables
What is the PCI policy about storing the last 3 or 4 digits of the card in the web database?
Does it have to be encrypted? Does it force the same level of PCI compliance as storing the entire ccn?
What is the PCI policy about sending emails with the last 3 or 4 digits? Unencrypted, of course.
Drush 3.0 has a config file now, .drushrc.php (located in one of 4 search folders), that lists what tables that the SQL commands 'backup' and 'dump' should ignore. Or something like that. I recall reading it can ignore it for full data dumps, or just outputing CREATE statements.
Peter
LA's Open Source User Group Advocate - Volunteer at DrupalCamp LA and SCALE
My understanding is that
My understanding is that truncation to 4 digits is one of the ways to make the PAN unreadable. It takes care of the PCI-DSS compliance without PKI encryption or a one-way hash.
My first thoughts about using Drush for backup, is why do it when the OS and MySQL will? The desire to K.I.S.S. tells me to avoid extra layers of abstraction whenever possible.
independent developer
4 digit confirmation and Drush early adoption-Unix power syntax
PCI issues are too important for single source opinions to carry the day. Any one else chiming in that unencrypted 4 digits is fine? Is there anyone who read the PCI docs, and found that explicitly stated?
What about using a file on the hard drive to encrypt ccn's, instead of doing it all in RAM? Does the PCI spec talk on the issue of a file existing for a microsecond on HD to encrypt a datafile there, that contains the ccn?
Does the PCI speak to encryption algorithms? Does it explicitly contain PGP or AES? I imagine it does AES, but PGP? GPG?
MYSQL BINARY LOGS
Using binary logs for other than replication and restoring after a hiccup, like using them for Drupal back up, does not make much sense to me. Binary logs are for a very specific design reason, and using them outside those reasons, seems to be too complex.
DRUSH FUTURE
The OS/MySQL needs to be customized to handle Drupal databases, the exclusion list, and coding that separately, into it's own script and startup files, when Drush has it built in, and already working, with a team upgrading the ability, at no cost to the local IT department, has it's advantages, TCO wise. Rolling your own scripts is something I would like to get out of. Upgrades to surrounding APIs creating unknown incompatibilities are a liability, requiring a complete test environment, per drupal install (I now have dozens, and plan on hundreds more). Unit testing can never cover the full range of desired configuration parameters. So, at some future point, an upgrade will break the home grown script. Not that this will not happen to 3rd party tool, like Drush plugin, but more minds doing beta testing, typing up a list of known bugs for the upgrade package, etc, etc. Home grown is likely to break more often than Drush scripting. Lots to be said about the open source model, having better quality.
I have a wish list for Drush, that would make using drush a desirable layer, a productivity aid.
I recently had to repair some drupal tables, sessions and cache_filter, by truncation at first, and then I tried the REPAIR TABLE sql command. If Drush could cycle through all drupal tables, and 3rd party tables in the database, according to an rc startup script for this purpose, and issue SQL commands against each table, and process the success or errors, into reports, detailed, summary, and single return code (per table, per group of tables, per drupal install, per owner of drupal installs, per web server running the drupal installs), that would be an exciting bulk Drupal quality assurance feature. Sure it's a plugin to Drush to do all that. It will happen, one day.
There are other things I would like Drush to do, and one day I will find the time to read the Issue Queue for planned and proposed features, and subscribe to those I like. And put into public review those I want, but did not find.
I have no doubt that Drush will become it's 'shell' command line interpreter, with a foreach ability. Drupal is too complex for the power of unix data/control stream theory not to be applied, to ease use at the admin layer. TCO again.
At that time, using the OS to manipulate anything in Drupal will be equivalent to using a shell CLI, and there will be no 'difference' to using Drush or the OS/MySQL for admin functions. Both would be scripts.
I'd like to see the adhoc Drush design direction adopt rigid unix theories for syntax, control and data streams. That would ensure future backward compatibility. This issue between Drush 2 and 3 regarding a space or a dash, new abbreviations... geesh. Smacks of bandaid design. Not good, imho.
Centralizing on a single admin tool for Drupal is a good thing. Going outside it means losing time and money, imho. Reading the writing on the wall, makes me an early adopter for Drush. That is why I would, will, use Drush for all my admin scripting. In a year or two, doing it outside of Drush, will make no sense, TCO wise.
I just hope they keep it backward compatible, as all shell scripting is (bash, sh, etc).
Peter
LA's Open Source User Group Advocate - Volunteer at DrupalCamp LA and SCALE
According to the PCI Security
According to the PCI Security Standards Counsel (https://www.pcisecuritystandards.org/)
Technical Guidelines for Protecting Stored Payment Card Data
(https://www.pcisecuritystandards.org/pdfs/pci_fs_data_storage.pdf)
"
At a minimum, PCI DSS requires PAN to be rendered unreadable anywhere it is stored – including
portable digital media, backup media, and in logs. Software solutions for this requirement may
include one of the following:
• One-way hash functions based on strong cryptography – also called hashed index, which
displays only index data that point to records in the database where sensitive data actually reside.
• Truncation – removing a data segment, such as showing only the last four digits.
• Index tokens and securely stored pads – encryption algorithm that combines sensitive plain text
data with a random key or “pad” that works only once.
• Strong cryptography – with associated key management processes and procedures. Refer to the
PCI DSS and PA-DSS Glossary of Terms, Abbreviations and Acronyms for the definition of “strong
encryption"
"
Glossary of Terms, Abbreviations, and Acronyms Version 1.2 October 2008
https://www.pcisecuritystandards.org/pdfs/pci_dss_glossary.pdf
"
Strong Cryptography Cryptography based on industry-tested and accepted algorithms, along with
strong key lengths and proper key-management practices. Cryptography is a
method to protect data and includes both encryption (which is reversible) and
hashing (which is not reversible, or “one way”). SHA-1 is an example of an
industry-tested and accepted hashing algorithm. Examples of industry-tested
and accepted standards and algorithms for encryption include AES (128 bits
and higher), TDES (minimum double-length keys), RSA (1024 bits and higher),
ECC (160 bits and higher), and ElGamal (1024 bits and higher).
See NIST Special Publication 800-57 (http://csrc.nist.gov/publications/) for
more information.
"
Back to my opinions/understandings. Any file level encryption would be disallowed with end-to-end encryption which may be part of future standards. I assume what you are talking about would leave residue that could be read by software installed on the server (which is how Heartland got hit). My questions is that all solutions I've seen involve secure encryption at the POS terminal. But online transactions don't have a POS machine so how can encryption be started?
independent developer
Oh, you found that 'sentence'
Oh, you found that 'sentence' I so much dreaded, and spent sleepless nights trying to grok.
Truncation – removing a data segment, such as showing only the last four digits.
Thanks for finding that excerpt. I recall my first thoughts years ago when I first read it.
Sigh, such poor, no bad, English. Is that even a sentence? I recall reading this, and not trusting. Why? The rest of the spec has good English, and to have this one place where ambiguity is introduced due to bad English, made me not trust it (trust the authors, reviewers, editors, industry...). I will not go deeply into the three major flaws of this, ah, 'sentence,' except to say: truncation is to "remove the last digits, not show them"..., ah, right - let's look at the definitions of Truncation at http://www.google.com/search?q=define%3A+Truncation. Removing data segments is not "showing only". Of course, one could read the 'sentence' as "defining" truncation, but that does not fit with the surrounding context. But is a valid interpretation. But is a it "good" interpretation? I read several industry expert opinions of why they wrote this 'sentence' so _____.
'Practice' of the rest of the industry is what I am looking for, a second expert opinion, perhaps based upon practice reviewed by PCI audit firm. The spec, written law, is only as good as currently practiced.
I recall Heartland made me go look at my code. I modified it.
Encryption starts with https from the viewer's browser to the web server, which decrypts it, and hands it to external CGI software, which I coded to quickly grab the ccn, process it asap, encrypt it, destroy any buffers with it, and then handle just the encrypted variable thereafter. That is, I put as few code lines between getting it from Apache, validation, debiting, and storing (encrypted, truncated, or just not storing it), all in one algorithm, no function calls outside, no substring, index, or other function calls to process the ccn. I wrote it all from basic language constructs, so there was no way to easily zap the executable, or script, to change it, to subvert it. And if it was done, it had to be done in only one place, in only a dozen lines of code. The ccn did not exist in RAM unencrypted for very long. The minimum amount of time.
What I did not like, is Apache pipes the ccn in a post file to the cgi. I'd rather put my code compiled into Apache next time. And I would like control over the c++ buffers holding the IO, so I know I could overwrite those particular blocks in RAM, L1, L2, L3... paranoid? Maybe. You tell me?
Apache should come out with a standard module (builtin, not library) to handle ccn's... according to PCI, imho.
The credit card industry could pay Apache, MS/IIS, for this level of protection. But they did not.
So, does that answer your 'online' question, compared to POS?
Thanks for finding the spec quotation. Yeah, I'm bummed it was not enough to satisfy me. I've used a range of cc validation, from the very earliest days it was available, and it's gone a long way to being so much easier, web based ssl requests and answers, instead of custom software executables to install and trust. Changing the ccn to a 'transaction number' to store in your local web database, in case you want to change the charged amount, or do another purchase a few days later. But not all validation firms give that level of service.
Again the PCI authors could have made the bar higher for entering into online validation service... and made it very secure, preventing any theft, at all levels of the transaction. But they did not. Sometimes I agree their reasons were valid, but 'completeness' also has to be considered when determining validity. By leaving out of the spec that some methods are 'better' than others, that lack of completeness undermined my evaluation of the validity of the PCI enterprise as paid for by the collective credit card firms. I still wonder why they left holes in the PCI spec. For so long...
I'd hope the level of detail of future PCI changes would include sample code, and how to eliminate unnecessary buffers (buffers=1), buffer copying (buffer=off), destroy buffer content (wipe with 1's), and prevent swapping the code/data block with the ccn to slow RAM, or even HHD. And show how to keep a 2K block containing the ccn and the code that handles it from being seen outside the CPU chip. Most encryption algorithms fit in under 2k, so this is a reasonable level to expect for the credit card industry to define. That would be "better."
And just to get in half a cent more...
Let's look at the last 4 digits? 3 makes sense, not 4. Why? The bank issuing the card can be guessed at based upon the billing address, so the first two digits, and the next 6 digits, have intelligent estimates for them. The last 6 digits is the actually account number. So, showing 4 digits, leaves just 2 digits unknown, or 99 guesses.
3 digits is enough for a person to know which of several credit card numbers they used... Leaving 999 guesses, or 10 times more secure.
And now you can buy groceries under $25 and not show any form of additional Id, or even sign for it.
So, reading between the lines, a certain amount of loss can be easily handled by the cc firms. Cost effective? You bet. Would I do it that way? Only if there were other scams I got paid huge amounts to allow? Maybe. I'll know when I am in that position, I suppose. I know now how I would decide, so likely I will never get in such a position. lol
Peter
LA's Open Source User Group Advocate - Volunteer at DrupalCamp LA and SCALE
I don't believe items in a
I don't believe items in a bullet list need to be complete sentences. I do think an em-dash would have been more clear than a hyphen.
My guess about why PCI left those holes open would be that maybe their logic was similar to the reason people resisted PFW's on the desktop or egress filtering on a network. The idea as I've heard it is that it is useless to try to close holes in an already compromised system, and you should instead keep the system from being compromised in the first place. Why is something that makes so much sense being taken as a rationalization for not providing multiple layers of protection?
Didn't the Heartland attack succeed because the SSL tunnel ends at in the presentation layer? Is End-to-End Encryption an improvement on SSL?
independent developer
Back on the topic of home grown versus via drush
I don't really see a reason why there cannot be a combination of both...in my mind, I look at the possibility of what happens when your site goes down, or some other funky error occurs (Drush currently requires a connection to your site and so it would fail in the above scenario). Drush (and the set of tools built around it) are fantastic, but that is not to say that there is something that would be wrong with running mysql backups using tools built by the folk behind mysql (I had mentioned maatkit above. It is DESIGNED for large databases (drush on its own is not quite yet. Maybe if you incorporate your plugin into using drush).. running 'repair table x' in mysql was also mentioned along with a set of other commands was also mentioned. Have you tried using mysqlcheck?) In the case above, both are open source (I'm sure the scripts that are used in Virtualmin are also quite stable and I believe are open source though I cannot comment on them due to my very limited use of it) and incorporating them into your set of scripts (or plugin tools for drush) should be easy (rolling own scripts being bad was mentioned...make them available to everyone! Put them up on google or sourceforge or even d.o! Make those sets of scripts better with others involved in that case).
I don't even remember where I was going with this at this point.