Common Language Resources and Technology Infrastructure (CLARIN)
CLARIN is committed to establish an integrated and interoperable research infrastructure of language resources and its technology. It aims at lifting the current fragmentation, offering a stable, persistent, accessible and extendable infrastructure and therefore enabling eHumanities.
- integrated: the resource and service centres are connected via Grid technology and form a virtually integrated domain
- interoperable: the resources and services will be based on Semantic Web technologies to overcome format, structure and terminological differences
- stable: the resources and services are offered with a high availability
- persistent: the resources and services are planned to be accessible for many years so that researchers can rely on them
- accessible: the resources and services are accessible via the web; different access methods and training possibilities are offered tailored to the needs of the communities making use of them
- extendable: the infrastructure is open so that new resources and services can be added easily
Communities to serve
CLARIN is offering its services to
- the different communities of linguists to optimize their models and the tools to the benefit of all using language material
- the humanities scholars in the broad sense to facilitate access to language resources and technology
- the society to enable lower thresholds to multicultural and multilingual content
CLARIN is devoted to create a Pan European infrastructure that will
- boost humanities research in a multicultural and multilingual area as Europe is,
- facilitate a multilingual and multicultural education in schools, colleges and universities
CLARIN is aware of the necessity to address the challenges of an increasing interest in electronic and multimedia communication. If we want to preserve cultural identity on the one hand, but make younger generations ready for the global competition on the other hand we have to improve their multilingual and multicultural awareness. CLARIN wants to make essential contributions.
The purpose of the infrastructure is to offer persistent services that are secure and provide easy access to language processing resources. As language, speech and vision technology improve, it should be commonplace to carry out tasks such as: 'summarize Le Monde from 11th March 2007' 'list all uses of “enthusiasm” in 19th century English novels written by women', 'find all video clips of Tony Blair on the BBC in 2007'. But without the proper infrastructure, the technologies to make these tasks possible will only be available to a few specialists. At present one needs to find an appropriate program (to do translation, summarization, or extraction of information, etc.), download the program, make sure it is compatible with the computer that will execute the program, understand the form of input it takes, download the data (e.g. novels, newspapers, corpus, videos), and convert them to the correct format for the programs, and all this before one can get started.
For most researchers outside computer science, at least one of these tasks will be an insurmountable barrier. Our vision is that the resources for processing language, the data to be processed as well as appropriate guidance, advice and training be made available and can be accessed over a distributed network from the user's desktop. CLARIN proposes to make this vision a reality: the user will have access to guidance and advice through distributed knowledge centres, and via a single sign-on the user will have access to repositories of data with standardized descriptions, processing tools ready to operate on standardized data, and all of this will be available on the internet using a service oriented architecture based on secure grid technologies.
The nature of the project is therefore primarily to turn existing, fragmented technology and resources into accessible and stable services that any user can share or adapt and repurpose. CLARIN can build upon a rich history of national and European initiatives in this domain, and it will ensure that Europe maintains the leading position in humanities and social science research in the current highly competitive era.