![]()
Linguistics,
Computational Linguistics, Corpus Linguistics

You can e-mail me at: alcazar@usc.edu
v
PhD Candidate in
Linguistics
v
MS in Computational
Linguistics (Department of
Linguistics, Department of Computer
Science, Information
Sciences Institute)
v
MA in Linguistics,
Hispanic Linguistics
Universidad
de Deusto, Bilbao (Spain)
v
BA in English
Philology, Universidad de Deusto
Syntax and its Interfaces with
Semantics and Morphology, Phase Theory, Tense and Aspect, Unaccusativity, Case
Theory, Non-finite Clauses; Semi-supervised Ontology Building, Applications of
Natural Language Processing for Linguistic Research; Weight as a Processing
Constraint in Syntactic Variation; Sociolinguistics and Discourse, Corpus
Creation.
Languages: Basque, Spanish, Italian,
Romance, Historical Romance, Latin
Refereed journal articles
2006 The
interpretation of imperfective aspect in Basque and its implications for our
traditional classification of verbs. Journal of Basque
Linguistics. (in press)
2003 Two Paradoxes in the Interpretation of Imperfective Aspect and the Progressive. Journal of Cognitive Science 4.1: 79-105.
Other refereed publications
2006 with
Mario Saltarelli. The quirky case of participial
clauses. Romance Languages
and Linguistic Theory 2005: Selected papers from ‘Going Romance’
2005, Utrecht, ed. Sergio Baauw, Frank Drijkoningen,
Ivo van Ginneken and Haike Jacobs.
2006 Against
an ontological commitment to unergative verbs. Proceedings of the 40th
Meeting of the
2006 with
Roberto Mayoral-Hernández.
Sociolinguistic factors conditioning the ordering of adverbial expressions in
Spanish: A computationally extended corpus analysis. Selected
Papers from the 36th Linguistic Symposium on Romance Languages,
2006 A
deceptive case of split-intransitivity in Basque. Selected Proceedings of
the International Symposium on the Typology of Argument Structure and
Grammatical Relations in Languages Spoken in
2006 with
Mario Saltarelli. Argument structure of participial clauses: the
unaccusative phase. Selected
Proceedings of the Hispanic
Linguistics Symposium 2006,
2003 The
Imperfective Paradox of Basque. USC Working Papers in
Linguistics 1.
Edited
books
2006 with
Roberto Mayoral Hernández and Michal Temkin Martínez
Proceedings of WECOL 2004,
2006 with Irene Barberia, Rebeka Campos Astorkiza and Susana Huidobro. Proceedings of BIDE 2004, Universidad de Deusto, Bilbao. (in press)
With
a co-authored introduction: ‘A new
relay of linguists’
Published conference proceedings and book articles [not refereed]
2006 Transitive
intransitives: Basque Unergatives Revisited. Proceedings
of the 4th
2006 Defining
transitivity and intransitivity: Split-intransitive languages and the
Unaccusative Hypothesis. A Festschrift for Larry Trask, ed.
2006 Towards
linguistically searchable text. Proceedings of BIDE 2005, Universidad de Deusto,
2004 A
Note on the Typological Classification of Basque. Working
Papers of the
2003 Verb Classes and Aspectual
Interpretation in Basque. Proceedings of Console XI. Università degli
Studi di Padova, Padua,
Italy.
2002 On the Correlate between Case
Assignment and Verbal Form in Basque. Proceedings of the 5th
Unpublished
presentations at conferences, workshops and symposia
2007 with
Mario Saltarelli.
Zanuttini’s Hypothesis: Participial Constructions Revisited. Linguistic Society of
2007 with
Roberto Mayoral Hernández. A corpus analysis of
weight and unaccusativity in Spanish. Linguistic
Society of
2006 with
Mario Saltarelli. The case of participial clauses: Italian vs. Romance. Linguistic Society of
2006 The
typology of absolute constructions. Hagit Borer’s Morphology Discussion Group. Department of Linguistics,
2005 with
Joseba Abaitua. A brief history of machine translation. Computer-Assisted
Translation.
2003 Two paradoxes on the interpretation of Imperfective aspect and the progressive. It's about time: Theoretical and experimental perspectives on tense, aspect, modality and events, Linguistic Society of America Linguistic Institute, Michigan State University, July 18-19, 2003.
Published software/resources
2006 Consumer Eroski Parallel
Corpus
I turned the Consumer Eroski magazine
(http://revista.consumer.es), which
publishes press articles written in Spanish and their translations to Basque,
Catalan and Galician, into a parallel corpus. The corpus has approximately 1.3
million words for each language, for a combined total of 5.2 million words. The
corpus is aligned at the sentence level and it is accessible online via Universidade de Vigo and Universidad de Deusto.
http://sli.uvigo.es/CLUVI/
(public access) www.deli.deusto.es (research intranet)
The corpus is rather
unique: the four major spoken languages in Spain are
represented—including Basque, in a standard educated register that serves
as a contemporary reference corpus and a basis for computational linguistics
and corpus linguistics research. The European Constitution does not yet exist
for these languages as a parallel corpus.
2004-05 Mundo-Hispano Search Interface
An application written
in Java that uses the Google Application Interface to make multiple searches
and individuate search results by Spanish speaking countries using the
approximation Geographical Location of
Server.
Hablamos Juntos (http://www.hablamosjuntos.org/) is a
$30 million initiative to improve patient-provider communication among limited
English proficiency Hispanics. Used by professional translators and HJ staff to
survey & assess the cultural
adequacy of translation practice from the US standard in healthcare to the 20
national varieties of Spanish represented by a heterogeneous population.
Unpublished software/utilities
2005-06 Suite
of utilities for working with online corpora (programmed in Python)
This suite facilitates
the use of the online corpora of the
v Automatic query and data extraction
v Automatic conversion from paragraph to relevant
sentence and context
v Automatic annotation of corpus metadata:
author (gender),
media source, country, topic, publisher, year
v Tool for manual annotation
v Automatic collation of databases with
manually annotated data
v Automatic generation of SPSS annotation files
to assess significance
2005-06 Suite
of utilities for corpus creation (programmed in Python)
The suite comprises
the software I developed to create the Consumer
Eroski Parallel Corpus. It features the following
capabilities:
v Web module for automatic download and storage
of raw text
v Text cleanser to eliminate everything but
text
v Sentence extractor to organize corpus at the
sentence level (sensitive to punctuation idiosyncrasies of Basque, Catalan,
Galician and Spanish).
v Sentence tokenizer
utility to interface with SVMtool (Spanish tagger).
v Utility to create input files for
Moore’s bilingual sentence aligner
v Utility to decode output files for
Moore’s bilingual sentence aligner
v Utility to derive corpus statistics
See corpus online:
http://sli.uvigo.es/CLUVI/
(public access) www.deli.deusto.es (research intranet)
2005-06 Linguistic
Search Interface (programmed in Python)
An
advanced search interface that supports parallel corpora. The interface features the operators listed
below. A frequency breakdown is available when the search seeks patterns.
v Exact search
v Boolean And,
Or & Not
v Combined Boolean operator search
v Word distance supported search
v Part-of-speech tag supported search
v Combined Word distance and Part-of-speech tag
supported search
v Verb search (e.g. nonfinite, subjunctive)
v Morpheme search (e.g. prefixes)
v Power search (predefined: e.g
relatives, absolutes, indirect questions…)
v Regular expression search
v Chain search: any of the above iteratively
2006 Weight
calculator for Spanish (programmed in Python)
Given a string of one
or more words in Spanish, this utility calculates the weight of the string as
words, syllables and phonemes. Used for research to determine (i) whether the weight of a constituent affects its
syntactic position, and (ii) whether some weight measures are better than
others.
Resource/Product Demonstrations
Demonstration
of Consumer Eroski
Parallel Corpus and Linguistic Search Interface
2005 DELi
Computational Linguistics Group, Universidad
de Deusto (Bilbao, Spain)
2005 Eroski Foundation
Headquarters (Elorrio, Spain)
2005 Computer-Assisted
Translation. International Summer Courses of the University
of the Basque Country. Palacio
Miramar (San Sebastian, Spain)
2005 Xavier Gómez Guinovart, Director of SLI Computational Linguistics
Group, Universidade de Vigo (Vigo,
Spain)
Demonstration of Mundo Hispano Search Engine
2005 Web
cast conference call for online application. Hablamos Juntos National Program Office, Tomás Rivera
Policy Institute (Los Angeles, California)
2004 Web
cast conference call for Prototype offline. Hablamos Juntos National Program Office, Tomás
Rivera Policy Institute (Los Angeles, California)
For
the Linguistics Department,
2006 Graduate Students in Linguistics Constitution Revision Committee
2005 Student-Faculty
liason
2005 Co-edited proceedings of Western Conference in Linguistics 2004
2004 Co-organized Western Conference in Linguistics 2004
2004 Co-created Parallel Session in Hispanic Linguistics for Western Conference in Linguistics 2004
2003-04 President, Hispanic
Linguistics Student Association
2003 Co-author
of constitution for Hispanic Linguistics Student Association
2003 Co-founded
Hispanic Linguistics Student Association
2003 Co-founded
Computational Linguistics Student Association
2001 Co-organized
West Coast Conference in Formal Linguistics XX
For the English
Department, Universidad de Deusto (Bilbao, Spain)
2004-06 Liaison for the Universidad de Deusto
Study Abroad Program
2005 Co-edited
Proceedings of Bilbao-Deusto Student Conference 2004
2004 Co-organized
Bilbao-Deusto Student Conference 2004
2004 Reviewer
for Bilbao-Deusto Student Conference 2004
2003
Co-founded Bilbao-Deusto Student Conference in Linguistics
Computational Linguistics Association
American Association for the Advancement of Science
You can e-mail me at: alcazar@usc.edu