Home > Research > How To Compile, Install and Run GIZA++

How To Compile, Install and Run GIZA++

November 18th, 2008 x=vv= Leave a comment Go to comments

One component of my QA search engine (part of my research work) actually needs to employ information on the semantic relatedness between words, and recently I plan to use GIZA++ to fulfill part of this job. GIZA++ is actually a statical machine translation toolkit for IBM Models 1-5 training and HMM word alignment.

This project is NOT documented very well, and I do have some difficulties at the very beginning, for both compilation and execution. After several hours trial-and-error, and with the help of Google, I made it finally, and I’m here to wrap this up. Probably it would be helpful for you.

Download GIZA++:

GIZA++ is available here: http://code.google.com/p/giza-pp/
$ wget http://giza-pp.googlecode.com/files/giza-pp-v1.0.2.tar.gz

Compilation/Installation of GIZA++:

The current version of GIZA++ is not able to be compiled with gcc 3.4.6. If you meet the same problem, get your gcc updated. What I’m using is gcc 4.1.2. Although it’s not the most up-to-date version, it works for GIZA++.

1. Decompress the tar file, and you will get a directory named giza-pp. There are two sub-folders under it, which contain the GIZA++ toolkit and the mkcls tool respectively.
$ tar xvzf giza-pp-v1.0.2.tar.gz

2. Modify the Makefile under giza-pp/GIZA++-v2:
(a) change the INSTALLDIR to the one appropriate for you (not necessary if you don’t invoke make install).
(b) find -DBINARY_SEARCH_FOR_TTABLE option and delete it (Why? See this).
$ cd giza-pp/GIZA++-v2
$ vi Makefile

3. Under the directory of giza-pp, run make to compile the source code for both GIZA++ and mkcls. This produces some execution files for both GIZA++ and mkcls projects.
Typing make install will only copy the GIZA++ execution file to the directory you specified, so it’s up to you whether you want it or not.
$ make
$ make install

Running GIZA++ (Training IBM Model):

Let’s say we already have a parallel corpus named english and foreign.

1. Run the plain2snt.out (under GIZA++-v2) to convert the parallel corpus into GIZA++ format.
$ plain2snt.out english foreign
This produces following files: english.vcb, foreign.vcb, english_foreign.snt and foreign_english.snt

2. Generate word classes, using mkcls (under mkcls-v2):
$ mkcls -penglish -Venglish.vcb.classes
$ mkcls -pforeign -Vforeign.vcb.classes
This produces four output files: english.vcb.classes, english.vcb.classes.cats, foreign.vcb.classes and foreign.vcb.classes.cats

3. Now use GIZA++ (under GIZA++-v2) to build the dictionary (-S is the source language, -T is the target language, -C is the generated aligned text file, and -o is the output file prefix):
$ GIZA++ -S english.vcb -T foreign.vcb -C english_foreign.snt -O dic

The final alignment file would be dic.a3.final

  1. November 19th, 2008 at 16:00 | #1

    It is a useful comment on GIZA++. Do you know if it is possible to run this tools under windows OS or not?

  2. kwang
    November 19th, 2008 at 16:33 | #2

    Well, as far as I know, there are some works on porting GIZA++ to Windows, but not for mkcls. Seems mkcls is not directly compilable on Windows.

    If you insist on using it on Windows, I would suggest to use ‘cygwin’. :)

  3. Nhu Khanh
    December 23rd, 2008 at 01:35 | #3

    Have you ever used dictionary for training in GIZA? Could you help me how to use dictionary in GIZA. I tried many times but the result is the same when I use and not use dictionary. I use this command :
    $ GIZA++ -S english.vcb -T foreign.vcb -C english_foreign.snt – dictionary dic_file -O dic
    where dic_file is a dictionary has format as in manual of GIZA.

    Thank you so much.

  4. Anand kumar
    May 11th, 2009 at 19:13 | #4

    I got this error at the time of running GIZA…

    GIZA++ -S corpus/sour.vcb -T corpus/targ.vcb -C corpus/sour_targ.snt

    ERROR: NO COOCURRENCE FILE GIVEN!
    Aborted

  5. Kallerdis
    May 22nd, 2009 at 19:24 | #5

    Anand,

    I had the same problem running GIZA++. Here is a solution.

    http://code.google.com/p/giza-pp/issues/detail?id=3

  6. mike
    June 11th, 2009 at 22:21 | #6

    thanks for telling us how to use this most important thing. But I am trying to use the mgiza, could you tell me how to use it, I can compile it, but can’t use it ,could you tell me? Thanks a lot

  7. gizaic
    July 30th, 2009 at 15:41 | #7

    Can you help me;
    I need to give parse trees to GIZA++ as input. How should i proceed.
    Please help

  8. MOON
    August 23rd, 2009 at 07:24 | #8

    I know GIZA++ can be used for training.
    Can GIZA++ do translation work?
    If it can not, how can I use generated .vcb files and=20
    .final files to do translation work?

  9. eknath
    November 27th, 2009 at 19:51 | #9

    When I have delete the DBINARY option inside Makefile, there is only one line I found.

    2) running make in giza-pp dir gives me errors and so does make install

  1. November 11th, 2009 at 01:02 | #1