Project:Infrastructure/Git migration
From Gentoo Wiki
Jump to:navigation
Jump to:search
Status
Final hosting is ready. Launch planning for weekend of August 8/9.
Blockers
- Infra Manpower
- Ensure availability of final history conversion host
- Needs lots of RAM, parallel CPU and some SSD backing
- Consider a 1-month Hetzner server bidding option
- Consider a RackSpace OnMetal I/O node (if available by the hour/day)
- Consider a large AWS instance by hour
- r3.2xlarge, m4.4xlarge, c4.8xlarge; maybe even larger?
Launch plan
Steps
Top-level items in bold are considered critical path to service migration.
- Freeze
- No more CVS commits to
gentoo-x86
ever again - CVS->rsync conversion frozen
- No more CVS commits to
- Take backups
- Final tree snapshot
- Final CVS history backup
- Publish both
- Perform cleanups on final snapshot
- Remove ChangeLog files
- Convert to thin manifests
- Publish cleaned snapshot as reference
- Commit fixed snapshot as initial signed commit on new history
- Allow developers to clone new repo and commit to it
- Turn on git->rsync
- Manifests: Converts thin->thick
- Changelogs: (temporary) we explicitly copy the changelog-as-is from the final
- Review/fix all scripts for further breakages
- Perform history conversion
- Re-introduce cleanups in history
- The state of (history conversion + cleanups) MUST match the state of (initial commit) at this point
- Make converted history available as graft point
- Adjust git->sync
- Re-enable true ChangeLog generation
- (maybe) Implement ChangeLog expiry mechanisms
Tentative date and times
Date and time | Event |
---|---|
2015/08/08 15:00 UTC | Freeze |
2015/08/08 19:00 UTC | Git commits open for developers |
2015/08/09 01:00 UTC | Rsync live again (with delayed changelogs) |
2015/08/11 | History repo available to graft |
2015/08/12 | rsync mirrors carry up-to-date changelogs again |
Resources
- Richard Freeman (rich0) 's validation code: https://github.com/rich0/gitvalidate
- ferringb's generation code: git://pkgcore.org/git-conversion-tools
People
This is in a roughly chronological order, and apologies to anybody that was left out.
- Alec Warner (antarus) - did the GSoC 2006 migration tests
- Robin H. Johnson (robbat2) - infra guy, herding this project
- Nguyen Thai Ngoc Duy (pclouds) - Former Gentoo developer, wrote Git features for the migration
- Michael Haggerty - upstream cvs2svn author
- Brian Harring (ferringb) - wrote much python to improve cvs2svn
- Michael G. Schwern - Perl hacker, fixed git-svn for SVN 1.7 support
- Rich Freeman (rich0) - validation scripts
- Patrick Lauer (patrick) - Gentoo dev, running new 2014 work in migration
Contact
For Git migration discussions subscribe to gentoo-scm mailing list: gentoo-scm@lists.gentoo.org
Conversion process
Goals
- Each Git commit should be mapped to one or more CVS commits
- Portage two-phase commits (commit 1: ebuilds/files/Manifest, commit 2: Manifest regenerated from $Header$ changes, optionally GPG-signed) should be mapped to a single commit
- Portage trailer data in CVS commit log should be converted to newline format Git logs
- As the validation settles, it should become possible to have CVS commits generate known Git commit IDs
- Start list of validated commit IDs
Pseudocode
do { do { adjust conversion scripts do test conversion validated all newly converted commits } while (not validation passed on all commits) switch CVS to read only do final conversion final validation if(final validation passed) { activate Git repo for public commits lock CVS permanently } else { unlock CVS } } while(still using CVS)
Historical migration
Here is how to generate the historical migration in git:
- Patch cvs2svn to use "/" as the separator in the date format in keywords. http://dev.gentoo.org/~rich0/gitmig/cvs2svn.patch
- Use the migration scripts at: https://github.com/gentoo/git-migration-scripts-rich0
- (provide list of dependencies for scripts)
- Obtain tarball of cvsroot (or squashfs - preferable for cache use)
- Place/mount cvs in cvs-repo
- Run script.sh --fast
- From git directory, run git bundle create <destpath> master
Validation
Quick notes on how to test:
- Source for the validation scripts at: https://github.com/rich0/gitvalidate.git
- Clone the git bundle into a directory
- Extract the cvs root into a directory
- (uncertain - may need to set up local bind mounts or symlinks to match the path in the cvs keywords)
- Checkout the cvs gentoo-x86 module into another directory
- (uncertain - may need to edit config files to ensure that cvs checkouts hit the local root, and don't hit Gentoo infra - test before running the script, or watch the script and if it isn't using near 100% CPU it probably is hammering the server so stop it!)
- Use git log to obtain the hash of the last git commit
- Point TMPDIR at a location with ~10GB of space (/tmp on tmpfs may not cut it and sort will fail).
- Run gitdump/gitprocesstree.sh <path to git tree root> <head commit hash> > g
- Run cvsdump/cvsprocesstree.sh <path to gentoo-x86 in cvs root> <path to checkout of gentoo-x86>. > c
- Create a table in mysql to hold the cvs output:
CREATE TABLE `cvs` ( `key` int(11) NOT NULL AUTO_INCREMENT, `filename` varchar(500) COLLATE utf8_bin NOT NULL, `type` varchar(5) COLLATE utf8_bin NOT NULL, `hash` varchar(50) COLLATE utf8_bin NOT NULL, `timestamp` int(11) NOT NULL, `author` varchar(200) COLLATE utf8_bin NOT NULL, `message` text COLLATE utf8_bin NOT NULL, `revision` varchar(10) COLLATE utf8_bin NOT NULL, PRIMARY KEY (`key`), KEY `filename` (`filename`(255),`hash`), KEY `hash` (`hash`) ) ENGINE=MyISAM AUTO_INCREMENT=3132434 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
- Create a table in mysql to hold the git output:
CREATE TABLE `git` ( `key` int(11) NOT NULL AUTO_INCREMENT, `filename` varchar(500) COLLATE utf8_bin NOT NULL, `type` varchar(5) COLLATE utf8_bin NOT NULL, `hash` varchar(50) COLLATE utf8_bin NOT NULL, `timestamp` int(11) NOT NULL, `author` varchar(200) COLLATE utf8_bin NOT NULL, `message` text COLLATE utf8_bin NOT NULL, `commit` varchar(50) COLLATE utf8_bin NOT NULL, PRIMARY KEY (`key`), KEY `filename` (`filename`(255),`hash`), KEY `hash` (`hash`) ) ENGINE=MyISAM AUTO_INCREMENT=3030211 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
- Define the base64 handling procedures found at http://stackoverflow.com/questions/358500/base64-encode-in-mysql
- Load the data into the tables:
load data local infile 'c' into table cvs fields terminated by ',' lines terminated by '\n' (filename,type,hash,timestamp,author,message,revision); load data local infile 'g' into table git fields terminated by ',' lines terminated by '\n' (filename,type,hash,timestamp,author,message,commit);
- Process the data into several tables:
create table onlycvs ENGINE = MYISAM select cvs.* from `cvs` left join `git` as g on cvs.hash=g.hash where g.hash is null ; create table onlygit ENGINE = MYISAM select g.* from `git` as g left join `cvs`on cvs.hash=g.hash where cvs.hash is null ; delete from onlycvs where revision="1.1.1.1" ; delete from onlycvs where filename like "%Manifest%" ; delete from onlygit where filename like "%Manifest%" ; create table baddate ENGINE = MYISAM select c.*,g.commit from `cvs` as c join `git` as g on (g.hash=c.hash and g.filename=c.filename) where abs(c.timestamp - g.timestamp) > 60*60 ; create table badmessage ENGINE = MYISAM select c.*, g.author as gauthor, g.commit, g.message as gmessage from `cvs` as c join `git` as g on (g.hash=c.hash and g.filename=c.filename) where c.message <> g.message and g.filename not like "%Manifest%" and abs(c.timestamp - g.timestamp) < 60*60; UPDATE `badmessage` SET `author`=BASE64_DECODE(`author`), `gauthor`=BASE64_DECODE(`gauthor`), `message`=BASE64_DECODE(`message`), `gmessage`=BASE64_DECODE(`gmessage`);
History
2006
- The first major work in VCS Migration was done as a GSoC 2006 project by User:Antarus.
- Git was mostly too resource intensive at this point for serious consideration, and was slower than CVS.
- Conversion takes more than 7 days.
- Decision to stay on CVS
2007
2008
2009
- April:
- Converting a recent CVS copy - Item 1: mailmap fun
- Converting a recent CVS copy - Item 2: statistics
- Conversion time: 18.5 hours
- June:
- Progress summary, 2009/06/01
- Conversion time: 9 hours
- Bug in cvs2svn/cvs2git causes lines of files to be lost
- ExternalBlobGenerator module created by upstream author, originally closed source, and non-public: improves pass1 from 36204 seconds to 1598 seconds
- October: Gentoo meeting at the GSoC Mentor Summit
- All Gentoo developers present held a meeting, one of the major topics was blockers and plans for the Git migration.
- Shawn Pearce, one of the major Git developers, and author of the Repo tool.
- Decision of a monolith repo, per-category repo, per-package repos: monolith repo wins.
2010
- User:ferringb takes on Python improvements with snakeoil and Unladen Swallow
- Gentoo SCM conversion status report, 2010/01/27
- Conversion time: 110 minutes
- Commit Signing & Sparse Trees identified as requirements
2011
- August:
- Re: gentoo-dev Progress on cvs->git migration (status report)
- Unresolved items: commit signing, thin Manifests, merge policies
- September:
- Portage gets thin Manifest support
- October:
2012
- May-July:
- Bug #418431: (git-svn is broken with SVN 1.7 and can corrupt data) causes a hassle for Git work (part of the migration process at this time relies heavily on the cvs2svn codebase)
- October:
- Email [gentoo-scm] Fwd: [gentoo-dev] CIA replacement on 2012/10/01 by rich0.
- Bug #333531: portage migration to git (tracker bug)
- Outstanding items: pre-upload hook, git2rsync scripts, validation, documentation
- Email [gentoo-scm] CVS -> git, list of where non-infra folk can contribute on 2012/10/01 by ferringb
- Lays out the many tasks well
- http://git.stuge.se/?p=portage.git;a=commitdiff;h=thickandthin mentioned for merging, still not done?
- Email [gentoo-scm] Fwd: [gentoo-dev] CIA replacement on 2012/10/01 by rich0.
2013
2014
- February: Progress made on some blockers (i.e. they were found obsoleted)
- Bug #333531: portage migration to git (tracker bug)
- Major outstanding items:
Wait for jk/pack-bitmap to land in a git release (pack-bitmap landed in git 2 release)- Enforce GPG commit signing
Get gitolite to log to syslog
- March: GLEP 63 - Minimum requirement and a recommended set of GPG key management policies for the Gentoo Linux distribution.
- May: Gentoo Keys: Tool that manages GPG key validation/updates and performs multiple "health" checks on GPG keys
- October: Regular test migrations happening, based on 2014/09/15 snapshot:
See also
- Grafting converted CVS commits to the new gentoo.git repository - Instructions for grafting (converted) historical CVS commits with gentoo.git history. Helpful for if you want to look through the full git log.