Database Version Control with Git

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
JSCSJSCS's picture

I need help understanding the workflow for database version control because what I am doing now is not working.

When I am on a branch (1), and about to commit, I do a Drush @alias sql-dump, which creates a dump file in a particular directory in my repo and uses a specific name, per instructions in the aliases.drushrc.php file. Then I commit and the sql-dump is added to the commit. I see the messages and everything is working fine.

Then I create another branch(2) and do some work, I know I am working on the same SQL database, but I have made some changes to it. I do a sql-dump and commit the changes for this branch (2).

Then I go checkout the first branch(1). When I look in the working directory, only shows the sql-dump I just made before switching Branch(2)! It has the timestamp of the branch(2) sql-dump. The branch(1) sqldump file has been overwritten it seems and is nowhere to be found.

I have tried everything I know. When I checkout differnet branches, I expect the sql-dump file to have the same name, but different timestamps and they don't.

If I sqlc < path/sqldump.sql it only confirms that I am on the last sql-dump, not the one I saved with the git commit.

I am stumped.

Comments

If you change a file and do

dragonwize's picture

If you change a file and do NOT commit that file, when you change branches, that file will still be there and if you do status it is the new file. The only way you will have 2 different files in 2 different branches and they get switched out when you change branches, are if both of those files are committed to their respective branches.

This is because git, like most scm, doesn't blow away any modified files that have changes that have not yet been committed when you switch branches in the same folder.

I am commiting the SQL DB

JSCSJSCS's picture

I am commiting sql-dump files separately for each branch. But only the last one shows up in the working directory regardless of which branch I checkout or how far I move head back, etc. They are not being filtered in the .gitignore file

James Sinkiewicz
Drupal Site Builder and Generalist
http://MyDrupalJourney.com

I can't help much without

dragonwize's picture

I can't help much without being there but all I can tell you is that it sounds like you did not commit the last sql file dump to the new branch before switching branches.

Not sure that I completely

tommyent's picture

Not sure that I completely understand but committing the database in branch2 would make it unavailable in branch1. You would need to merge that back into branch1. As for the timestamp I would think that is just as far as the OS is concerned.

Look into cherry picking http://schacon.github.com/git/git-cherry-pick.html

Since the database dumped to

JSCSJSCS's picture

Since the database dumped to the same location on each branch and conmmited, once you change branches, the drush sqlc < path/sqldump.sql is supposed to connect to that file that is in version control.

I am making progress. it looks like as I move from branch to branch and up and down the timeline, the datestamp of the dump file (and interestingly the parent directory and all it's other files) are of different sizes. If I checkout my initial commit from days ago, that dump file in the working directory has a current timestamp of when I checked out the repo, and not the time it was commited a few days ago...but the reported file size is changed.

More testing...

James Sinkiewicz
Drupal Site Builder and Generalist
http://MyDrupalJourney.com

Any committed files are not

tommyent's picture

Any committed files are not going to show in another branch. I assume the timestamp is due to the fact that that was the last time the file was altered as far as the OS is considered.
So if I understand correctly you work in branch1 then dump the database. Commit the database and switch to branch2. In order for that dump from branch1 to be accessible in branch2 you will need to merge that dump or check it out. Otherwise branch2 will have the last dump you committed while working in branch2.

Check out branch2 make a test.txt file commit it then switch to branch1 that test.txt should not be there. Maybe that will help make sense of it.
So if in branch 1 and want database from branch2 I think you can do this

git checkout branch1
git checkout branch2 path/to/file.sql

Forgive me if I'm not understanding

My "Wrong" thinking

JSCSJSCS's picture

After several hours I have come to the conclusion that Git is "working" as designed, just not to my satisfaction. Being new to Git, it seemed reasonable to assume that if I worked on a fileA in one Branch (1), committed those changes and then worked on another Branch, regardless of what I did there, when I came back to Branch (1) 5 days later, my fileA would still be there in the same condition as I left it and have a timestamp from 5 days ago.

Apparently this is just "wrong" thinking. Git compares file changes in the working directory from one branch checkout to another. If two identical tracked files in their respective working directories were not touched in either Branch, Git will leave it alone. Makes sense to me!

However, if a file is tracked between branches, and it is changed in any branch, when you switch branches, Git will present the contents of the file the way you left it, but change the timestamp to the current time. This is true for directories as well.

I discovered in my database issue that I only thought my changes were not being saved from past commits because the timestamp kept changing as I moved back and forth between checkouts. Since I was using the same SQL file for my DB in both branches, Git was changing the timestamp every time I switched branches.

Finally I looked a the actual SQL-DUMP files and saw that the changes were being preserved.

This timestamp changing behavior is by design for some Make/Build responsiveness reason that I cannot yet understand. It helps Git be as fast as it is, by not doing something it would have to if the timestamps were kept intact. It's beyond me.

Anyway, thanks for the help and if you want to see this for yourself, here is a "example case."

git init
touch filea fileb

* $ ls -l

total 0
-rw-r--r-- 1 JSCS Administ 0 Apr 27 21:00 filea
-rw-r--r-- 1 JSCS Administ 0 Apr 27 21:00 fileb

$ ls -l > filea //this is the last time Master writes to this file

* $ ls -l
* total 1
* -rw-r--r--    1 JSCS     Administ      132 Apr 27 21:00 filea  
* -rw-r--r--    1 JSCS     Administ        0 Apr 27 21:00 fileb

git add .

git commit -m "first commit"

* [master (root-commit) 159c1e6] first commit
*  1 file changed, 3 insertions(+)
*  create mode 100644 filea
*  create mode 100644 fileb

git status

* # On branch master
* nothing to commit (working directory clean)

touch fileb -t 200912301000

* $ ls -l
* total 1
* -rw-r--r--    1 JSCS     Administ      132 Apr 27 21:00 filea  **Still same time on Master
* -rw-r--r--    1 JSCS     Administ        0 Dec 30  2009 fileb

git status

* # On branch master
* nothing to commit (working directory clean)

git checkout -b dev

* Switched to a new branch 'dev'

ls -l

* total 1
* -rw-r--r--    1 JSCS     Administ      132 Apr 27 21:00 filea
* -rw-r--r--    1 JSCS     Administ        0 Dec 30  2009 fileb

ls -l > filea ** writes to filea

* total 1

-rw-r--r-- 1 JSCS Administ 132 Apr 27 21:10 filea **Dev file time advances
-rw-r--r-- 1 JSCS Administ 0 Dec 30 2009 fileb

git status

* # On branch dev

Changes not staged for commit:

(use "git add ..." to update what will be committed)

(use "git checkout -- ..." to discard changes in working directory

#

modified: filea

#
no changes added to commit (use "git add" and/or "git commit -a")

git commit -am "First Dev Commit"

* [dev a5b0c75] First Dev Commit

1 file changed, 2 insertions(+), 2 deletions(-)

git st

* # On branch dev

nothing to commit (working directory clean)

ls -l

* total 1

-rw-r--r-- 1 JSCS Administ 132 Apr 27 21:10 filea **still same DEV time on file
-rw-r--r-- 1 JSCS Administ 0 Dec 30 2009 fileb

git checkout master

* Switched to branch 'master'

ls -l

* total 1

-rw-r--r-- 1 JSCS Administ 132 Apr 27 21:16 filea ##file timestamp is now advanced even though master never touched this file since last commit.
-rw-r--r-- 1 JSCS Administ 0 Dec 30 2009 fileb

I see another mydrupaljourney blog in the making!

James Sinkiewicz
Drupal Site Builder and Generalist
http://MyDrupalJourney.com

Hi James, you're sooo close,

mike stewart's picture

Hi James, you're sooo close, and your thinking is essentially correct. Git tracks changes in files. However let's modify your test to prove whether the contents of the files change based on branch.

GIT excels at tracking changes to content within files. It doesn't track folders. It doesn't track file permissions (except executable). GIT aims to track changes within files by hashing the current working tree, creating a patch (the delta), and finally storing that patch in its repository with history of the parent of that patch. It might be helpful to think of GIT as a patch management system.

So using your example as a starting point, this should show that files on separate branches are indeed, 'recreated' based on the patches for that branch. Try the following in a shell (ignore the parts in quotes, thats my output for illustration):


mkdir fun-with-git
cd mkdir fun-with-git
git init

Initialized empty Git repository in /home/quickstart/websites/fun-with-git/.git/

# create files
touch filea fileb
# add content to files
echo "EDIT1: This is the first edit of FILEA from master branch" >> filea
echo "EDIT2: This is the first edit to FILEB from master branch" >> fileb
# add files to GIT INDEX/Stage/cache (or whatever else documentation seems to call it)
git add .
git status

# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached ..." to unstage)
#
# new file: filea
# new file: fileb
#

#commit files to GIT
git commit -m "First commit"
# Create and checkout a new branch. Side note, branches are essentially bookmarks to a place in time. ie., a particular commit -- the parent commit of the branch.
git checkout -b dev

Switched to a new branch 'dev'

echo "EDIT3: DEV This is the first edit from dev branch, and it appends FILEA" >> filea
git commit -am "DEV commit to append FILEA"
echo "EDIT4: DEV This is the first edit from dev branch, and it replaces the contents of FILEB" > filea
# oops, wrong file. ever made a mistake in real life? well, here's how to get out of it.
# Note that 'git status' often tells you your next options:

09:29:58 {dev} ~/websites/fun-with-git$ git status
# On branch dev
# Changes not staged for commit:
# (use "git add ..." to update what will be committed)
# (use "git checkout -- ..." to discard changes in working directory)
#
# modified: filea
#

git checkout -- filea ## undo mistake. -- see note below
echo "EDIT4: DEV This is the first edit from dev branch, and it replaces the contents of FILEB" > fileb
git commit -am "DEV commit to replace contents of FILEB"

09:32:43 (dev) ~/websites/fun-with-git$ ll
total 8
-rw-r--r-- 1 quickstart quickstart 130 2012-04-28 09:30 filea
-rw-r--r-- 1 quickstart quickstart 89 2012-04-28 09:31 fileb
git checkout master
Switched to branch 'master'
09:41:07 (master) ~/websites/fun-with-git$ ll
total 8
-rw-r--r-- 1 quickstart quickstart 58 2012-04-28 09:41 filea
-rw-r--r-- 1 quickstart quickstart 58 2012-04-28 09:41 fileb
09:42:30 (master) ~/websites/fun-with-git$ cat file*
EDIT1: This is the first edit of FILEA from master branch
EDIT2: This is the first edit to FILEB from master branch
09:42:36 (master) ~/websites/fun-with-git$ git checkout dev
Switched to branch 'dev'
09:43:20 (dev) ~/websites/fun-with-git$ cat file*
EDIT1: This is the first edit of FILEA from master branch
EDIT3: DEV This is the first edit from dev branch, and it appends FILEA
EDIT4: DEV This is the first edit from dev branch, and it replaces the contents of FILEB
09:43:24 (dev) ~/websites/fun-with-git$ cat filea
EDIT1: This is the first edit of FILEA from master branch
EDIT3: DEV This is the first edit from dev branch, and it appends FILEA

This should show that files on separate branches are indeed, 'recreated' based on the patches for that branch.
... hope this helps?

# Notes: git checkout. I alternatively could have used: git reset --hard
# However, 'git reset' (like rebase) re-writes history, so should be used with caution when working with others

--
mike stewart { twitter: @MediaDoneRight | IRC nick: mike stewart }

Still working on workfow

JSCSJSCS's picture

I am still working this workflow thingy.

Sometimes I work from my laptop and sometimes the desktop PC. The main repo is on bitbucket.

The Laptop and the PC have identical (except for user.name) .git/config, .gitconfig, and git/etc/config files.

But the results of git remote show origin differ in one area:

Laptop:
Local branches configured for 'git pull':
dev rebases onto remote dev
master rebases onto remote master

PC:
Local branches configured for 'git pull':
dev merges with remote dev
master merges with remote master

I cannot for the life of me figure out why they are different, how to fix it and which one I should fix?

James Sinkiewicz
Drupal Site Builder and Generalist
http://MyDrupalJourney.com

sounds like your laptop is

mike stewart's picture

sounds like your laptop is configured to "auto rebase" (in .gitconfig). At some point you probably did this on your laptop --->

git config --global branch.autosetuprebase always

which is a common practice: http://randyfay.com/node/103

and some more info on why/when to rebase:
http://stackoverflow.com/questions/2472254/when-should-i-use-git-pull-re...

--
mike stewart { twitter: @MediaDoneRight | IRC nick: mike stewart }

That's is what I thought

JSCSJSCS's picture

That's is what I thought initially too Mike, but all my git configuration files are the same between the Laptop and PC, including the .giconfig's which include:

[push]
default = current
[branch]
autosetuprebase = always

So really, with that setting, the PC should also say "rebase" just like the Laptop does now, since it has the same setting.

James Sinkiewicz
Drupal Site Builder and Generalist
http://MyDrupalJourney.com

Found the answer

JSCSJSCS's picture

In order to get my PC to rebase I had to add rebase - true to git/config file:

$ git config branch.dealer.rebase true
$ git config branch.master.rebase true

[branch "master"]
remote = origin
merge = refs/heads/master
rebase = true //added this line
[branch "dealer"]
remote = origin
merge = refs/heads/dealer
rebase = true //added this line

After double checking the laptop's git/config file I saw they the two were NOT identical as I thought. The laptop had the "rebase = true" lines.

Oh well.

Thanks for the chat Mike!

James Sinkiewicz
Drupal Site Builder and Generalist
http://MyDrupalJourney.com

LA Drupal [Los Angeles Drupal]

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: