8.2. Migrer sur Git
Si vous avez une base de code dans un autre VCS et que vous avez décidé d'utiliser Git, vous devez migrer votre projet d'une manière ou d'une autre. Ce chapitre traite d'outils d'import inclus dans Git avec des systèmes communs et démontre comment développer votre propre outil.
8.2.1. Importer
Nous allons détailler la manière d'importer des données à partir de deux des plus grands systèmes SCM utilisés en milieu professionnel, Subversion et Perforce, pour les raisons combinées qu'ils regroupent la majorité des utilisateurs que je connais migrer vers Git et que des outils de grande qualité pour ces deux systèmes sont distribués avec Git.
8.2.2. Subversion
Si vous avez lu la section précédente sur l'utilisation de
git svn, vous pouvez facilement utiliser ces
instructions pour réaliser un git svn clone du
dépôt. Ensuite, arrêtez d'utiliser le serveur Subversion, poussez
sur un nouveau serveur Git et commencez à l'utiliser. Si vous
voulez l'historique, vous pouvez l'obtenir aussi rapidement que
vous pourrez tirer les données du serveur Subversion (ce qui peut
prendre un certain temps).
Cependant, l'import n'est pas parfait ; et comme cela prend autant
de temps, autant le faire bien. Le premier problème est
l'information d'auteur. Dans Subversion, chaque personne qui valide
dispose d'un compte sur le système qui est enregistré dans
l'information de validation. Les exemples de la section précédente
montrent schacon à certains endroits, tels que
la sortie de blame ou de
git svn log. Si vous voulez transposer ces
données vers des données d'auteur au format Git, vous avez besoin
d'une correspondance entre les utilisateurs Subversion et les
auteurs Git. Créez un fichier appelé users.txt
contenant cette équivalence dans le format suivant :
schacon = Scott Chacon <[email protected]> selse = Someo Nelse <[email protected]>
Pour récupérer la liste des noms d'auteurs utilisés par SVN, vous pouvez utiliser la ligne suivante :
$ svn log --xml | grep author | sort -u | perl -pe 's/.>(.?)<./$1 = /'
Cela génère une sortie au format XML — vous pouvez visualiser les
auteurs, créer une liste unique puis éliminer l'XML. Évidemment,
cette ligne ne fonctionne que sur une machine disposant des
commandes grep, sort et
perl. Ensuite, redirigez votre sortie dans votre
fichier users.txt pour pouvoir y ajouter en correspondance les
données équivalentes Git.
Vous pouvez alors fournir ce fichier à git svn
pour l'aider à convertir les données d'auteur plus précisément.
Vous pouvez aussi indiquer à git svn de ne pas
inclure les métadonnées que Subversion importe habituellement en
passant l'option --no-metadata à la commande
clone ou init. Au final,
votre commande d'import ressemble à ceci :
$ git-svn clone http://mon-projet.googlecode.com/svn/ \
--authors-file=users.txt --no-metadata -s my_project
Maintenant, l'import depuis Subversion dans le répertoire
my_project est plus présentable. En lieu et
place de commits qui ressemblent à ceci :
commit 37efa680e8473b615de980fa935944215428a35a
Author: schacon <schacon@4c93b258-373f-11de-be05-5f7a86268029>
Date: Sun May 3 00:12:22 2009 +0000
fixed install - go to trunk
git-svn-id: https://my-project.googlecode.com/svn/trunk@94 4c93b258-373f-11de-
be05-5f7a86268029
les commits ressemblent à ceci :
commit 03a8785f44c8ea5cdb0e8834b7c8e6c469be2ff2 Author: Scott Chacon <[email protected]> Date: Sun May 3 00:12:22 2009 +0000 fixed install - go to trunk
Non seulement le champ auteur a meilleure mine, mais de plus le
champ git-svn-id a disparu.
Il est encore nécessaire de faire un peu de ménage
post-import. Déjà, vous devriez nettoyer les
références bizarres que git svn crée.
Premièrement, déplacez les balises pour qu'elles soient de vraies
balises plutôt que des branches distantes étranges, ensuite
déplacez le reste des branches pour qu'elles deviennent locales.
Pour déplacer les balises et en faire de vraies balises Git, lancez
$ cp -Rf .git/refs/remotes/tags/* .git/refs/tags/ $ rm -Rf .git/refs/remotes/tags
Cela récupère les références déclarées comme branches distantes
commençant par tags/ et les transforme en vraies
balises (légères).
Ensuite, déplacez le reste des références sous
refs/remotes en branches locales :
$ cp -Rf .git/refs/remotes/* .git/refs/heads/ $ rm -Rf .git/refs/remotes
À présent, toutes les vieilles branches sont des vraies branches Git et toutes les vieilles balises sont de vraies balises Git. La dernière activité consiste à ajouter votre nouveau serveur Git comme serveur distant et à y pousser votre projet transformé. Pour pousser tout, y compris branches et balises, lancez :
$ git push origin --all
Toutes vos données, branches et tags sont à présent disponibles sur le serveur Git comme import propre et naturel.
8.2.3. Perforce
The next system you'll look at importing from is Perforce. A
Perforce importer is also distributed with Git, but only in the
contrib section of the source code — it isn't
available by default like git svn. To run it,
you must get the Git source code, which you can download from
git.kernel.org:
$ git clone git://git.kernel.org/pub/scm/git/git.git $ cd git/contrib/fast-import
In this fast-import directory, you should find
an executable Python script named git-p4. You
must have Python and the p4 tool installed on
your machine for this import to work. For example, you'll import
the Jam project from the Perforce Public Depot. To set up your
client, you must export the P4PORT environment variable to point to
the Perforce depot:
$ export P4PORT=public.perforce.com:1666
Run the git-p4 clone command to import the Jam
project from the Perforce server, supplying the depot and project
path and the path into which you want to import the project:
$ git-p4 clone //public/jam/src@all /opt/p4import Importing from //public/jam/src@all into /opt/p4import Reinitialized existing Git repository in /opt/p4import/.git/ Import destination: refs/remotes/p4/master Importing revision 4409 (100%)
If you go to the /opt/p4import directory and run
git log, you can see your imported work:
$ git log -2 commit 1fd4ec126171790efd2db83548b85b1bbbc07dc2 Author: Perforce staff <[email protected]> Date: Thu Aug 19 10:18:45 2004 -0800 Drop 'rc3' moniker of jam-2.5. Folded rc2 and rc3 RELNOTES into the main part of the document. Built new tar/zip balls. Only 16 months later. [git-p4: depot-paths = "//public/jam/src/": change = 4409] commit ca8870db541a23ed867f38847eda65bf4363371d Author: Richard Geiger <[email protected]> Date: Tue Apr 22 20:51:34 2003 -0800 Update derived jamgram.c [git-p4: depot-paths = "//public/jam/src/": change = 3108]
You can see the git-p4 identifier in each
commit. It's fine to keep that identifier there, in case you need
to reference the Perforce change number later. However, if you'd
like to remove the identifier, now is the time to do so — before
you start doing work on the new repository. You can use
git filter-branch to remove the identifier
strings en masse:
$ git filter-branch --msg-filter '
sed -e "/^\[git-p4:/d"
'
Rewrite 1fd4ec126171790efd2db83548b85b1bbbc07dc2 (123/123)
Ref 'refs/heads/master' was rewritten
If you run git log, you can see that all the
SHA–1 checksums for the commits have changed, but the
git-p4 strings are no longer in the commit
messages:
$ git log -2 commit 10a16d60cffca14d454a15c6164378f4082bc5b0 Author: Perforce staff <[email protected]> Date: Thu Aug 19 10:18:45 2004 -0800 Drop 'rc3' moniker of jam-2.5. Folded rc2 and rc3 RELNOTES into the main part of the document. Built new tar/zip balls. Only 16 months later. commit 2b6c6db311dd76c34c66ec1c40a49405e6b527b2 Author: Richard Geiger <[email protected]> Date: Tue Apr 22 20:51:34 2003 -0800 Update derived jamgram.c
Your import is ready to push up to your new Git server.
8.2.4. A Custom Importer
If your system isn't Subversion or Perforce, you should look for an
importer online — quality importers are available for CVS, Clear
Case, Visual Source Safe, even a directory of archives. If none of
these tools works for you, you have a rarer tool, or you otherwise
need a more custom importing process, you should use
git fast-import. This command reads simple
instructions from stdin to write specific Git data. It's much
easier to create Git objects this way than to run the raw Git
commands or try to write the raw objects (see Chapter 9 for more
information). This way, you can write an import script that reads
the necessary information out of the system you're importing from
and prints straightforward instructions to stdout. You can then run
this program and pipe its output through
git fast-import.
To quickly demonstrate, you'll write a simple importer. Suppose you
work in current, you back up your project by occasionally copying
the directory into a time-stamped
back_YYYY_MM_DD backup directory, and you want
to import this into Git. Your directory structure looks like this:
$ ls /opt/import_from back_2009_01_02 back_2009_01_04 back_2009_01_14 back_2009_02_03 current
In order to import a Git directory, you need to review how Git
stores its data. As you may remember, Git is fundamentally a linked
list of commit objects that point to a snapshot of content. All you
have to do is tell fast-import what the content
snapshots are, what commit data points to them, and the order they
go in. Your strategy will be to go through the snapshots one at a
time and create commits with the contents of each directory,
linking each commit back to the previous one.
As you did in the « An Example Git Enforced Policy » section of Chapter 7, we'll write this in Ruby, because it's what I generally work with and it tends to be easy to read. You can write this example pretty easily in anything you're familiar with — it just needs to print the appropriate information to stdout. And, if you are running on Windows, this means you'll need to take special care to not introduce carriage returns at the end your lines — git fast-import is very particular about just wanting line feeds (LF) not the carriage return line feeds (CRLF) that Windows uses.
To begin, you'll change into the target directory and identify every subdirectory, each of which is a snapshot that you want to import as a commit. You'll change into each subdirectory and print the commands necessary to export it. Your basic main loop looks like this:
last_mark = nil
# loop through the directories
Dir.chdir(ARGV[0]) do
Dir.glob("*").each do |dir|
next if File.file?(dir)
# move into the target directory
Dir.chdir(dir) do
last_mark = print_export(dir, last_mark)
end
end
end
You run print_export inside each directory,
which takes the manifest and mark of the previous snapshot and
returns the manifest and mark of this one; that way, you can link
them properly. « Mark » is the
fast-import term for an identifier you give to a
commit; as you create commits, you give each one a mark that you
can use to link to it from other commits. So, the first thing to do
in your print_export method is generate a mark
from the directory name:
mark = convert_dir_to_mark(dir)
You'll do this by creating an array of directories and using the index value as the mark, because a mark must be an integer. Your method looks like this:
$marks = []
def convert_dir_to_mark(dir)
if !$marks.include?(dir)
$marks << dir
end
($marks.index(dir) + 1).to_s
end
Now that you have an integer representation of your commit, you
need a date for the commit metadata. Because the date is expressed
in the name of the directory, you'll parse it out. The next line in
your print_export file is
date = convert_dir_to_date(dir)
where convert_dir_to_date is defined as
def convert_dir_to_date(dir)
if dir == 'current'
return Time.now().to_i
else
dir = dir.gsub('back_', '')
(year, month, day) = dir.split('_')
return Time.local(year, month, day).to_i
end
end
That returns an integer value for the date of each directory. The last piece of meta-information you need for each commit is the committer data, which you hardcode in a global variable:
$author = 'Scott Chacon <[email protected]>'
Now you're ready to begin printing out the commit data for your importer. The initial information states that you're defining a commit object and what branch it's on, followed by the mark you've generated, the committer information and commit message, and then the previous commit, if any. The code looks like this:
# print the import information
puts 'commit refs/heads/master'
puts 'mark :' + mark
puts "committer #{$author} #{date} -0700"
export_data('imported from ' + dir)
puts 'from :' + last_mark if last_mark
You hardcode the time zone (–0700) because doing so is easy. If you're importing from another system, you must specify the time zone as an offset. The commit message must be expressed in a special format:
data (size)\n(contents)
The format consists of the word data, the size of the data to be
read, a newline, and finally the data. Because you need to use the
same format to specify the file contents later, you create a helper
method, export_data:
def export_data(string)
print "data #{string.size}\n#{string}"
end
All that's left is to specify the file contents for each snapshot.
This is easy, because you have each one in a directory — you can
print out the deleteall command followed by the
contents of each file in the directory. Git will then record each
snapshot appropriately:
puts 'deleteall'
Dir.glob("**/*").each do |file|
next if !File.file?(file)
inline_data(file)
end
Note: Because many systems think of their revisions as changes from
one commit to another, fast-import can also take commands with each
commit to specify which files have been added, removed, or modified
and what the new contents are. You could calculate the differences
between snapshots and provide only this data, but doing so is more
complex — you may as well give Git all the data and let it figure
it out. If this is better suited to your data, check the
fast-import man page for details about how to
provide your data in this manner.
The format for listing the new file contents or specifying a modified file with the new contents is as follows:
M 644 inline path/to/file data (size) (file contents)
Here, 644 is the mode (if you have executable files, you need to
detect and specify 755 instead), and inline says you'll list the
contents immediately after this line. Your
inline_data method looks like this:
def inline_data(file, code = 'M', mode = '644')
content = File.read(file)
puts "#{code} #{mode} inline #{file}"
export_data(content)
end
You reuse the export_data method you defined
earlier, because it's the same as the way you specified your commit
message data.
The last thing you need to do is to return the current mark so it can be passed to the next iteration:
return mark
NOTE: If you are running on Windows you'll need to make sure that you add one extra step. As metioned before, Windows uses CRLF for new line characters while git fast-import expects only LF. To get around this problem and make git fast-import happy, you need to tell ruby to use LF instead of CRLF:
$stdout.binmode
That's it. If you run this script, you'll get content that looks something like this:
$ ruby import.rb /opt/import_from commit refs/heads/master mark :1 committer Scott Chacon <[email protected]> 1230883200 -0700 data 29 imported from back_2009_01_02deleteall M 644 inline file.rb data 12 version two commit refs/heads/master mark :2 committer Scott Chacon <[email protected]> 1231056000 -0700 data 29 imported from back_2009_01_04from :1 deleteall M 644 inline file.rb data 14 version three M 644 inline new.rb data 16 new version one (...)
To run the importer, pipe this output through
git fast-import while in the Git directory you
want to import into. You can create a new directory and then run
git init in it for a starting point, and then
run your script:
$ git init
Initialized empty Git repository in /opt/import_to/.git/
$ ruby import.rb /opt/import_from | git fast-import
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects: 5000
Total objects: 18 ( 1 duplicates )
blobs : 7 ( 1 duplicates 0 deltas)
trees : 6 ( 0 duplicates 1 deltas)
commits: 5 ( 0 duplicates 0 deltas)
tags : 0 ( 0 duplicates 0 deltas)
Total branches: 1 ( 1 loads )
marks: 1024 ( 5 unique )
atoms: 3
Memory total: 2255 KiB
pools: 2098 KiB
objects: 156 KiB
---------------------------------------------------------------------
pack_report: getpagesize() = 4096
pack_report: core.packedGitWindowSize = 33554432
pack_report: core.packedGitLimit = 268435456
pack_report: pack_used_ctr = 9
pack_report: pack_mmap_calls = 5
pack_report: pack_open_windows = 1 / 1
pack_report: pack_mapped = 1356 / 1356
---------------------------------------------------------------------
As you can see, when it completes successfully, it gives you a
bunch of statistics about what it accomplished. In this case, you
imported 18 objects total for 5 commits into 1 branch. Now, you can
run git log to see your new history:
$ git log -2 commit 10bfe7d22ce15ee25b60a824c8982157ca593d41 Author: Scott Chacon <[email protected]> Date: Sun May 3 12:57:39 2009 -0700 imported from current commit 7e519590de754d079dd73b44d695a42c9d2df452 Author: Scott Chacon <[email protected]> Date: Tue Feb 3 01:00:00 2009 -0700 imported from back_2009_02_03
There you go — a nice, clean Git repository. It's important to note
that nothing is checked out — you don't have any files in your
working directory at first. To get them, you must reset your branch
to where master is now:
$ ls $ git reset --hard master HEAD is now at 10bfe7d imported from current $ ls file.rb lib
You can do a lot more with the fast-import tool
— handle different modes, binary data, multiple branches and
merging, tags, progress indicators, and more. A number of examples
of more complex scenarios are available in the
contrib/fast-import directory of the Git source
code; one of the better ones is the git-p4
script I just covered.
- n
- Next Page
- p
- Previos Page
- h
- Book Home
- u
- Go Up One Level
- ?
- Press ? for Help
- esc
- Hide Help
Press '?' for keyboard shortcuts