8.2. Migrer sur Git

Si vous avez une base de code dans un autre VCS et que vous avez décidé d'utiliser Git, vous devez migrer votre projet d'une manière ou d'une autre. Ce chapitre traite d'outils d'import inclus dans Git avec des systèmes communs et démontre comment développer votre propre outil.

8.2.1. Importer

Nous allons détailler la manière d'importer des données à partir de deux des plus grands systèmes SCM utilisés en milieu professionnel, Subversion et Perforce, pour les raisons combinées qu'ils regroupent la majorité des utilisateurs que je connais migrer vers Git et que des outils de grande qualité pour ces deux systèmes sont distribués avec Git.

8.2.2. Subversion

Si vous avez lu la section précédente sur l'utilisation de git svn, vous pouvez facilement utiliser ces instructions pour réaliser un git svn clone du dépôt. Ensuite, arrêtez d'utiliser le serveur Subversion, poussez sur un nouveau serveur Git et commencez à l'utiliser. Si vous voulez l'historique, vous pouvez l'obtenir aussi rapidement que vous pourrez tirer les données du serveur Subversion (ce qui peut prendre un certain temps).

Cependant, l'import n'est pas parfait ; et comme cela prend autant de temps, autant le faire bien. Le premier problème est l'information d'auteur. Dans Subversion, chaque personne qui valide dispose d'un compte sur le système qui est enregistré dans l'information de validation. Les exemples de la section précédente montrent schacon à certains endroits, tels que la sortie de blame ou de git svn log. Si vous voulez transposer ces données vers des données d'auteur au format Git, vous avez besoin d'une correspondance entre les utilisateurs Subversion et les auteurs Git. Créez un fichier appelé users.txt contenant cette équivalence dans le format suivant :

schacon = Scott Chacon <[email protected]>
selse = Someo Nelse <[email protected]>

Pour récupérer la liste des noms d'auteurs utilisés par SVN, vous pouvez utiliser la ligne suivante :

$ svn log --xml | grep author | sort -u | perl -pe 's/.>(.?)<./$1 = /'

Cela génère une sortie au format XML — vous pouvez visualiser les auteurs, créer une liste unique puis éliminer l'XML. Évidemment, cette ligne ne fonctionne que sur une machine disposant des commandes grep, sort et perl. Ensuite, redirigez votre sortie dans votre fichier users.txt pour pouvoir y ajouter en correspondance les données équivalentes Git.

Vous pouvez alors fournir ce fichier à git svn pour l'aider à convertir les données d'auteur plus précisément. Vous pouvez aussi indiquer à git svn de ne pas inclure les métadonnées que Subversion importe habituellement en passant l'option --no-metadata à la commande clone ou init. Au final, votre commande d'import ressemble à ceci :

$ git-svn clone http://mon-projet.googlecode.com/svn/ \
      --authors-file=users.txt --no-metadata -s my_project

Maintenant, l'import depuis Subversion dans le répertoire my_project est plus présentable. En lieu et place de commits qui ressemblent à ceci :

commit 37efa680e8473b615de980fa935944215428a35a
Author: schacon <schacon@4c93b258-373f-11de-be05-5f7a86268029>
Date:   Sun May 3 00:12:22 2009 +0000

    fixed install - go to trunk

    git-svn-id: https://my-project.googlecode.com/svn/trunk@94 4c93b258-373f-11de-
    be05-5f7a86268029

les commits ressemblent à ceci :

commit 03a8785f44c8ea5cdb0e8834b7c8e6c469be2ff2
Author: Scott Chacon <[email protected]>
Date:   Sun May 3 00:12:22 2009 +0000

    fixed install - go to trunk

Non seulement le champ auteur a meilleure mine, mais de plus le champ git-svn-id a disparu.

Il est encore nécessaire de faire un peu de ménage post-import. Déjà, vous devriez nettoyer les références bizarres que git svn crée. Premièrement, déplacez les balises pour qu'elles soient de vraies balises plutôt que des branches distantes étranges, ensuite déplacez le reste des branches pour qu'elles deviennent locales.

Pour déplacer les balises et en faire de vraies balises Git, lancez

$ cp -Rf .git/refs/remotes/tags/* .git/refs/tags/
$ rm -Rf .git/refs/remotes/tags

Cela récupère les références déclarées comme branches distantes commençant par tags/ et les transforme en vraies balises (légères).

Ensuite, déplacez le reste des références sous refs/remotes en branches locales :

$ cp -Rf .git/refs/remotes/* .git/refs/heads/
$ rm -Rf .git/refs/remotes

À présent, toutes les vieilles branches sont des vraies branches Git et toutes les vieilles balises sont de vraies balises Git. La dernière activité consiste à ajouter votre nouveau serveur Git comme serveur distant et à y pousser votre projet transformé. Pour pousser tout, y compris branches et balises, lancez :

$ git push origin --all

Toutes vos données, branches et tags sont à présent disponibles sur le serveur Git comme import propre et naturel.

8.2.3. Perforce

The next system you'll look at importing from is Perforce. A Perforce importer is also distributed with Git, but only in the contrib section of the source code — it isn't available by default like git svn. To run it, you must get the Git source code, which you can download from git.kernel.org:

$ git clone git://git.kernel.org/pub/scm/git/git.git
$ cd git/contrib/fast-import

In this fast-import directory, you should find an executable Python script named git-p4. You must have Python and the p4 tool installed on your machine for this import to work. For example, you'll import the Jam project from the Perforce Public Depot. To set up your client, you must export the P4PORT environment variable to point to the Perforce depot:

$ export P4PORT=public.perforce.com:1666

Run the git-p4 clone command to import the Jam project from the Perforce server, supplying the depot and project path and the path into which you want to import the project:

$ git-p4 clone //public/jam/src@all /opt/p4import
Importing from //public/jam/src@all into /opt/p4import
Reinitialized existing Git repository in /opt/p4import/.git/
Import destination: refs/remotes/p4/master
Importing revision 4409 (100%)

If you go to the /opt/p4import directory and run git log, you can see your imported work:

$ git log -2
commit 1fd4ec126171790efd2db83548b85b1bbbc07dc2
Author: Perforce staff <[email protected]>
Date:   Thu Aug 19 10:18:45 2004 -0800

    Drop 'rc3' moniker of jam-2.5.  Folded rc2 and rc3 RELNOTES into
    the main part of the document.  Built new tar/zip balls.

    Only 16 months later.

    [git-p4: depot-paths = "//public/jam/src/": change = 4409]

commit ca8870db541a23ed867f38847eda65bf4363371d
Author: Richard Geiger <[email protected]>
Date:   Tue Apr 22 20:51:34 2003 -0800

    Update derived jamgram.c

    [git-p4: depot-paths = "//public/jam/src/": change = 3108]

You can see the git-p4 identifier in each commit. It's fine to keep that identifier there, in case you need to reference the Perforce change number later. However, if you'd like to remove the identifier, now is the time to do so — before you start doing work on the new repository. You can use git filter-branch to remove the identifier strings en masse:

$ git filter-branch --msg-filter '
        sed -e "/^\[git-p4:/d"
'
Rewrite 1fd4ec126171790efd2db83548b85b1bbbc07dc2 (123/123)
Ref 'refs/heads/master' was rewritten

If you run git log, you can see that all the SHA–1 checksums for the commits have changed, but the git-p4 strings are no longer in the commit messages:

$ git log -2
commit 10a16d60cffca14d454a15c6164378f4082bc5b0
Author: Perforce staff <[email protected]>
Date:   Thu Aug 19 10:18:45 2004 -0800

    Drop 'rc3' moniker of jam-2.5.  Folded rc2 and rc3 RELNOTES into
    the main part of the document.  Built new tar/zip balls.

    Only 16 months later.

commit 2b6c6db311dd76c34c66ec1c40a49405e6b527b2
Author: Richard Geiger <[email protected]>
Date:   Tue Apr 22 20:51:34 2003 -0800

    Update derived jamgram.c

Your import is ready to push up to your new Git server.

8.2.4. A Custom Importer

If your system isn't Subversion or Perforce, you should look for an importer online — quality importers are available for CVS, Clear Case, Visual Source Safe, even a directory of archives. If none of these tools works for you, you have a rarer tool, or you otherwise need a more custom importing process, you should use git fast-import. This command reads simple instructions from stdin to write specific Git data. It's much easier to create Git objects this way than to run the raw Git commands or try to write the raw objects (see Chapter 9 for more information). This way, you can write an import script that reads the necessary information out of the system you're importing from and prints straightforward instructions to stdout. You can then run this program and pipe its output through git fast-import.

To quickly demonstrate, you'll write a simple importer. Suppose you work in current, you back up your project by occasionally copying the directory into a time-stamped back_YYYY_MM_DD backup directory, and you want to import this into Git. Your directory structure looks like this:

$ ls /opt/import_from
back_2009_01_02
back_2009_01_04
back_2009_01_14
back_2009_02_03
current

In order to import a Git directory, you need to review how Git stores its data. As you may remember, Git is fundamentally a linked list of commit objects that point to a snapshot of content. All you have to do is tell fast-import what the content snapshots are, what commit data points to them, and the order they go in. Your strategy will be to go through the snapshots one at a time and create commits with the contents of each directory, linking each commit back to the previous one.

As you did in the « An Example Git Enforced Policy » section of Chapter 7, we'll write this in Ruby, because it's what I generally work with and it tends to be easy to read. You can write this example pretty easily in anything you're familiar with — it just needs to print the appropriate information to stdout. And, if you are running on Windows, this means you'll need to take special care to not introduce carriage returns at the end your lines — git fast-import is very particular about just wanting line feeds (LF) not the carriage return line feeds (CRLF) that Windows uses.

To begin, you'll change into the target directory and identify every subdirectory, each of which is a snapshot that you want to import as a commit. You'll change into each subdirectory and print the commands necessary to export it. Your basic main loop looks like this:

last_mark = nil

# loop through the directories
Dir.chdir(ARGV[0]) do
  Dir.glob("*").each do |dir|
    next if File.file?(dir)

    # move into the target directory
    Dir.chdir(dir) do 
      last_mark = print_export(dir, last_mark)
    end
  end
end

You run print_export inside each directory, which takes the manifest and mark of the previous snapshot and returns the manifest and mark of this one; that way, you can link them properly. « Mark » is the fast-import term for an identifier you give to a commit; as you create commits, you give each one a mark that you can use to link to it from other commits. So, the first thing to do in your print_export method is generate a mark from the directory name:

mark = convert_dir_to_mark(dir)

You'll do this by creating an array of directories and using the index value as the mark, because a mark must be an integer. Your method looks like this:

$marks = []
def convert_dir_to_mark(dir)
  if !$marks.include?(dir)
    $marks << dir
  end
  ($marks.index(dir) + 1).to_s
end

Now that you have an integer representation of your commit, you need a date for the commit metadata. Because the date is expressed in the name of the directory, you'll parse it out. The next line in your print_export file is

date = convert_dir_to_date(dir)

where convert_dir_to_date is defined as

def convert_dir_to_date(dir)
  if dir == 'current'
    return Time.now().to_i
  else
    dir = dir.gsub('back_', '')
    (year, month, day) = dir.split('_')
    return Time.local(year, month, day).to_i
  end
end

That returns an integer value for the date of each directory. The last piece of meta-information you need for each commit is the committer data, which you hardcode in a global variable:

$author = 'Scott Chacon <[email protected]>'

Now you're ready to begin printing out the commit data for your importer. The initial information states that you're defining a commit object and what branch it's on, followed by the mark you've generated, the committer information and commit message, and then the previous commit, if any. The code looks like this:

# print the import information
puts 'commit refs/heads/master'
puts 'mark :' + mark
puts "committer #{$author} #{date} -0700"
export_data('imported from ' + dir)
puts 'from :' + last_mark if last_mark

You hardcode the time zone (–0700) because doing so is easy. If you're importing from another system, you must specify the time zone as an offset. The commit message must be expressed in a special format:

data (size)\n(contents)

The format consists of the word data, the size of the data to be read, a newline, and finally the data. Because you need to use the same format to specify the file contents later, you create a helper method, export_data:

def export_data(string)
  print "data #{string.size}\n#{string}"
end

All that's left is to specify the file contents for each snapshot. This is easy, because you have each one in a directory — you can print out the deleteall command followed by the contents of each file in the directory. Git will then record each snapshot appropriately:

puts 'deleteall'
Dir.glob("**/*").each do |file|
  next if !File.file?(file)
  inline_data(file)
end

Note: Because many systems think of their revisions as changes from one commit to another, fast-import can also take commands with each commit to specify which files have been added, removed, or modified and what the new contents are. You could calculate the differences between snapshots and provide only this data, but doing so is more complex — you may as well give Git all the data and let it figure it out. If this is better suited to your data, check the fast-import man page for details about how to provide your data in this manner.

The format for listing the new file contents or specifying a modified file with the new contents is as follows:

M 644 inline path/to/file
data (size)
(file contents)

Here, 644 is the mode (if you have executable files, you need to detect and specify 755 instead), and inline says you'll list the contents immediately after this line. Your inline_data method looks like this:

def inline_data(file, code = 'M', mode = '644')
  content = File.read(file)
  puts "#{code} #{mode} inline #{file}"
  export_data(content)
end

You reuse the export_data method you defined earlier, because it's the same as the way you specified your commit message data.

The last thing you need to do is to return the current mark so it can be passed to the next iteration:

return mark

NOTE: If you are running on Windows you'll need to make sure that you add one extra step. As metioned before, Windows uses CRLF for new line characters while git fast-import expects only LF. To get around this problem and make git fast-import happy, you need to tell ruby to use LF instead of CRLF:

$stdout.binmode

That's it. If you run this script, you'll get content that looks something like this:

$ ruby import.rb /opt/import_from 
commit refs/heads/master
mark :1
committer Scott Chacon <[email protected]> 1230883200 -0700
data 29
imported from back_2009_01_02deleteall
M 644 inline file.rb
data 12
version two
commit refs/heads/master
mark :2
committer Scott Chacon <[email protected]> 1231056000 -0700
data 29
imported from back_2009_01_04from :1
deleteall
M 644 inline file.rb
data 14
version three
M 644 inline new.rb
data 16
new version one
(...)

To run the importer, pipe this output through git fast-import while in the Git directory you want to import into. You can create a new directory and then run git init in it for a starting point, and then run your script:

$ git init
Initialized empty Git repository in /opt/import_to/.git/
$ ruby import.rb /opt/import_from | git fast-import
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:       5000
Total objects:           18 (         1 duplicates                  )
      blobs  :            7 (         1 duplicates          0 deltas)
      trees  :            6 (         0 duplicates          1 deltas)
      commits:            5 (         0 duplicates          0 deltas)
      tags   :            0 (         0 duplicates          0 deltas)
Total branches:           1 (         1 loads     )
      marks:           1024 (         5 unique    )
      atoms:              3
Memory total:          2255 KiB
       pools:          2098 KiB
     objects:           156 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize =   33554432
pack_report: core.packedGitLimit      =  268435456
pack_report: pack_used_ctr            =          9
pack_report: pack_mmap_calls          =          5
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =       1356 /       1356
---------------------------------------------------------------------

As you can see, when it completes successfully, it gives you a bunch of statistics about what it accomplished. In this case, you imported 18 objects total for 5 commits into 1 branch. Now, you can run git log to see your new history:

$ git log -2
commit 10bfe7d22ce15ee25b60a824c8982157ca593d41
Author: Scott Chacon <[email protected]>
Date:   Sun May 3 12:57:39 2009 -0700

    imported from current

commit 7e519590de754d079dd73b44d695a42c9d2df452
Author: Scott Chacon <[email protected]>
Date:   Tue Feb 3 01:00:00 2009 -0700

    imported from back_2009_02_03

There you go — a nice, clean Git repository. It's important to note that nothing is checked out — you don't have any files in your working directory at first. To get them, you must reset your branch to where master is now:

$ ls
$ git reset --hard master
HEAD is now at 10bfe7d imported from current
$ ls
file.rb  lib

You can do a lot more with the fast-import tool — handle different modes, binary data, multiple branches and merging, tags, progress indicators, and more. A number of examples of more complex scenarios are available in the contrib/fast-import directory of the Git source code; one of the better ones is the git-p4 script I just covered.

n
Next Page
p
Previos Page
h
Book Home
u
Go Up One Level
?
Press ? for Help
esc
Hide Help
Your Ad Here