GENERAL INFORMATION
Software: LGExtract
Authors: Matthieu Constant and Elsa Tolone
Organization: Université Paris-Est, LIGM
License: LGPL
Version: 3.3
Date: 2011/03/31
URL: http://infolingu.univ-mlv.fr/

REFERENCES
Constant, Matthieu & Tolone, Elsa (2010). A generic tool to generate a lexicon 
for NLP from Lexicon-Grammar tables. Lingue d'Europa e del Mediterraneo, 
Grammatica comparata, vol. 1, pp. 79--93. Edited by Michele De Gioia. Aracne.

Tolone, Elsa (2011). Analyse syntaxique à l'aide des tables du Lexique-Grammaire 
du français. Thèse de doctorat, LIGM, Université Paris-Est. 326 pp.


DESCRIPTION
LGExtract is an open-source software devoted to converting Lexicon-Grammar 
tables into an XML-structured syntactic lexicon LGLex. It is implemented in Java.


USAGE
Usage: java fr.umlv.lgextract.LGExtract --script <script> --tableDirPath 
<tableDirPath> --headerColumn <headerColumn> --headerRow <headerRow> [--debug] 
[--tdtValues <tdtValues>] [--values <values>] [--output <output>] <tdt>
  --script <script>
        Script pathname (the script file must be encoded in UTF-8)

  --tableDirPath <tableDirPath>
        Path of the directory containing the tables

  --headerColumn <headerColumn>
        Column id of the property headers in the table of classes (first column
        is 0)

  --headerRow <headerRow>
        Row id of the table headers in the table of classes (first row is 0)

  [--debug]
        indicates that the program is run in debug mode (if not present, it is
        run in the normal mode)

  [--tdtValues <tdtValues>]
        Path of the file containing interpretations of the values in the table
        of classes. Default: standard Lexicon-Grammar value interpretation

  [--values <values>]
        Path of the file containing interpretations of the values in the
        Lexicon-Grammar tables. Default: standard Lexicon-Grammar value
        interpretation

  [--output <output>]
        Output types: textual (txt) or XML (xml). Default value: txt

  [--tableFormat <tableFormat>]
        Tables format: Excel (xls) or CSV (csv). Default value: csv

  <tdt>
        List of the tables of classes used

EXTERNAL LIBRARIES
- Tatoo (http://tatoo.univ-mlv.fr/): used to parse the configuration script 
[tatoo-runtime.jar]
- Velocity-dep (http://mvnrepository.com/artifact/velocity/velocity-dep): 
used by Tatoo [velocity-dep-1.4.jar]
- JDom  (http://www.jdom.org/): used to generate the XML output [jdom.jar]
- JExcelApi (http://jexcelapi.sourceforge.net/): used to parse Excel files 
[jxl.jar]
- JSAP (http://martiansoftware.com/jsap/): used to parse the command line 
arguments [JSAP-2.1.jar]

The jar archive files of these libraries are included in the directory 'jar'. 
They have to be added to the variable CLASSPATH to use LGExtract.
Example:
export CLASSPATH=$CLASSPATH:classes:jar/tatoo-runtime.jar:jar/velocity-dep-1.4.jar:jar/jxl.jar:jar/JSAP-2.1.jar:jar/jdom.jar


DIRECTORY CONTENT:

* Predefined scripts
8 predefined scripts are given in LGExtract directory, that allow for 
launching LGExtract engine for a given part-of-speech (verbs, 
predicative nouns, adverbs or fixed sentences) and generating an LGLex 
lexicon in textual format or xml format. Warning: the use of these scripts 
requires to create an environment variable TABLESPATH indicating the path of 
the root directory containing all the data (tables) and the software.
Examples
./launch displays the help
./launch_verbes > $TABLESPATH/lglex/verbes-lglex.txt
./launch_noms-predicatifs > $TABLESPATH/lglex/noms-predicatifs-lglex.txt
./launch_figees > $TABLESPATH/lglex/figees-lglex.txt
./launch_adverbes > $TABLESPATH/lglex/adverbes-lglex.txt
./launch_verbes_xml > $TABLESPATH/lglex/verbes-lglex.xml
./launch_noms-predicatifs_xml > $TABLESPATH/lglex/noms-predicatifs-lglex.xml
./launch_figees_xml > $TABLESPATH/lglex/figees-lglex.xml
./launch_adverbes_xml > $TABLESPATH/lglex/adverbes-lglex.xml

Known bugs:
Sometimes, the program tries to read a row full of empty cells (after filled 
rows). It is then necessary to remove a certain number of empty rows, so it 
works. You are warned of such problem by an error message indicating the id 
of the problematic row. Be aware that the numbering of the rows starts at 0.


* Configuration files
4 configuration scripts (with file extension .lg), one per part-of-speech 
(lgc_verbes.lg, lgc_noms-predicatif.lg, lgc_figees.lg et lgc_adverbes.lg).
The configuration files are encoded in UTF-8.

* Table value specification files
2 files specifying the interpratation of different values in the tables 
(tables-values.txt, tablesOfClasses-values.txt)
They are formatted as follows: s symbol followed by a space followed by 
the interpretation ('true' or 'false').

* Misc
1 excecutable makeTableOfClasses aloowing for generating the table of classes 
in Excel format
Usage: java LGExtractTableOfTables <dirpath of tables> <output xls file>
Example : ./makeTableOfTables ../verbes tdt-verbes.xls

1 perl script list2code.pl that generates .lg code from a txt file containing 
constructions (1 per line)
