语料库语言学术语汇编 ( V2.0 )Last updated 2012-10-08 by许家金Aboutness所言之事Absolute frequency绝对频数Alignment (of parallel texts)(平行或对应)语料的对齐Alphanumeric字母数字构成的Annotate标注(动词)Annotated text/corpus标注文本 /语料库、赋码文本/语料库Annotation标注(名词)Annotation scheme标注方案ANSI/American National Standards Institute美国国家标准学会ASCII/American Standard Code for Information美国信息交换标准码ExchangeAssociates (of keywords)(主题词的)联想词AWL/academic word list学术词表Balanced corpus平衡语料库Base list/baselist底表、基础词表Bigram二元组、二元序列、二元结构Bi-text/bitext双语合并文本、双语分行对齐文本(一句源语一句目标语对齐后的文本)Bi-hapax两次词Bilingual corpus双语语料库Bootcamp debate/discourse/discussion(新手)训练营大辩论 /话语 /大探讨CA/Contrastive Analysis对比分析Case-sensitive/case sensitivity大小写敏感、区分大小写Category-based approach基于类(范畴)的方法Chi-square test/ 2χ卡方检验Chunk词块CIA/Contrastive Interlanguage Analysis中介语对比分析CLAWS/Constituent Likelihood Automatic Word-CLAWS 词性赋码系统tagging SystemClean text policy干净文本原则Cluster词簇、词丛Colligation类联接、类连接、类联结Collocate n./v.搭配词;搭配Collocability搭配强度、搭配力Collocation搭配、词语搭配Collocational strength搭配强度Collocational framework/frame搭配框架Collocational profile搭配概貌Collocational network搭配网络Comparable corpora类比语料库、可比语料库Computational Linguistics计算语言学ConcGram/concgram同现词列、框合结构Concord索引(行)(简略形式)Concordance (line)索引(行)Concordance plot(索引)词图Concordancer索引工具Concordancing索引分析Context语境、上下文Context word语境词Contextual prosody语境韵律Contingency table连列表、联列表、列连表、列联表Co-occurrence/Co-occurring共现、同现Corpus Linguistics语料库语言学Corpus, pl. corpora语料库Corpus-based基于语料库的Corpus-based translation studies基于语料库的翻译研究、语料库翻译学、基于语料库的译学研究Corpus-driven语料库驱动的Corpus-informed语料库指导下的、参考了语料库的Corpus size库容Corpus stylistics语料库文体学Co-select/co-selection/co-selectiveness共选(机制)Co-text共文Data mining数据挖掘DDL/Data Driven Learning数据驱动学习Dependency(句法)依存关系Dice coefficient Dice 系数Disambiguation消歧Diachronic corpus历时语料库Discourse话语、语篇Discourse prosody话语韵律Documentation文检报告、备检文件、说明文档EAGLES/Expert Advisory Groups on Language EAGLES 文本规格Engineering StandardsEmpirical linguistics实证语言学Empiricism经验主义Encoding字符编码Error-tagging错误标注、错误赋码Explicitation显化Extended unit of meaning扩展意义单位File-based search/concordancing批量检索Firthian (linguistics)弗斯(语言学)、弗斯学派的(语言学)Formulaic sequence程式化序列、套语Frequency频数、频率Frequency list词频表General (purpose) corpus通用语料库Genre语体、体裁Grammatical patterning语法型式Granularity颗粒度Hapax legomenon/hapax一次词Header/corpus head文本头、头标、头文件Hidden Markov model (HMM)隐马尔科夫模型、隐马模型Idiom principle习语原则、成语原则Idiomaticity习语性、地道程度Implicitation隐化Index/indexing(建)索引In-line annotation文内标注、行内标注Interlanguage中介语、过渡语Inter-coder agreement/reliability标注者间一致性/信度Introspection/introspective内省(式)(的)Intuition直觉Key keywords关键主题词Keyness主体性、关键性Keywords主题词KWIC/Key Word in Context语境中的关键词、语境共现(方式)KWIC sort语境共现排序、索引行排序Learner corpus学习者语料库Lemma, pl. lemmata/lemmas词目、原形词、词元Lemmatization词形还原、词元化Lemmatizer词形还原工具、词元化工具Lexical bundle词束Lexical density词汇密度Lexical frequency profile词频概貌Lexical grammar词汇语法Lexical item词项、词语项目Lexical patterning词语型式、词汇型式Lexical priming词汇触发理论、词汇启动理论Lexical profile词汇分布概貌Lexical richness词汇丰富度Lexico-grammar词汇语法Lexis词语、词项、词语学Log-likelihood ratio对数似然比、对数似然率Longitudinal/developmental corpus跟踪语料库、发展语料库、历时语料库Machine-readable机读的Machine translation机器翻译Manual annotation手工标注Markup/mark-up标记、置标MDA (Multi-dimensional analysis/approach)多维度分析法Metadata元信息Meta-metadata元元信息MF/MD approach/multi-feature/multi-dimensional多特征/多维度分析法analysisMisuse误用Monitor corpus(动态)监察语料库Monolingual corpus单语语料库Multilingual corpus多语语料库Multimodal corpus多模态语料库MWU/multiword unit多词单位MWE/multiword expression多词表达MI/mutual information互信息、互现信息N-gram N 元组、 N 元序列、 N 元结构、 N 元词、多词序列Neo-Firth (school)新弗斯学派Neo-Firthian新弗斯学派的NLP/Natural Language Processing自然语言处理Node (word)节点(词)Normalization标准化、(翻译)规范化、泛化Normalized frequency标准化频率、标称频率、归一频率Observed corpus观察语料库Ontology知识本体、本体Open choice principle开放选择原则OrthographicOrthography正字法Overuse过多使用、超用、使用过度、过度使用Paradigmatic纵聚合(关系)的Parallel corpus平行语料库、对应语料库Parole linguistics言语语言学Parsed corpus句法标注的语料库、树库Parser句法分析器Parsing句法标注、句法分析Pattern/patterning型式、模式Pattern grammar型式语法Pattern matching模式匹配Pedagogic corpus教学语料库Phraseology短语、短语学Phraseological unit/sequence短语单位 /序列Plain text纯文本POSgram赋码序列、码串POS sequence赋码序列、码串POS tagging/Part-of-Speech tagging词性赋码、词性标注、词性附码POS tagger词性赋码器、词性赋码工具Prefab预制语块Probabilistic(基于)概率的、概率性的、盖然的Probabilistic grammar概率语法、概率性语法、盖然语法Probability概率Query查询、检索Range分布(范围)、跨度Rationalism理性主义Raw frequency原始频数、生频数Raw text/corpus生文本 /生语料Reference corpus参照语料库Regex/RE/RegExp/regular expressions正则表达式、正则式Register variation语域变异Relative frequency相对频率Representative/representativeness代表性(的)Rule-based基于规则的S-universals源语型共性(特征)Sample n./v.样本;取样、采样、抽样Sampling取样、采样、抽样Sanitization净化Search term检索项Search word检索词Segmentation切分、分词Semantic association语义联想Semantic preference语义倾向、语义趋向Semantic prosody语义韵Sentence alignment句对齐、句级对齐SGML/Standard Generalized Markup Language标准通用标记语言Simplification简化Skipgram跨词序列、跨词结构Span跨距Specialized corpus专用语料库、专门用途语料库、专题语料库Standardized type/token ratio标准化类符 /形符比、标准化类/形比、标准化型次比Standardized TTR/STTR标准化类符 /形符比、标准化类/形比、标准化型次比Stand-off annotation分离式标注Stochastic随机的Stop list停用词表、过滤词表Stop word停用词、过滤词Synchronic corpus共时语料库Syntagmatic横组合(关系)的T score T 值T-universals目标语型共性(特征)Tag赋码、标记、附码Tagger赋码器、赋码工具、标注工具Tagging赋码、标注、附码Tag sequence赋码序列、码串Tagset赋码集、码集Tertium comparationis对比中立项、对比基础Text文本Text type文体、文类Text category文体、文类Text mining文本挖掘TEI/Text Encoding Initiative TEI 文本编码计划The Lexical Approach词汇中心教学法The Lexical Syllabus词汇大纲Token形符、词次Token definition/word definition形符界定、单词界定Tokenization分词Tokenizer分词工具Transcription转写Translation memory翻译记忆(库)Translation norms翻译规范Translationuniversals/Universal features of 翻译共性、翻译普遍特征translationTranslational corpus翻译语料库Translationese翻译体、翻译腔Treebank树库Trigram三元组、三元序列、三元结构T-score T 值Type类符、词种、词型TTR类符 /形符比、类 /形比、型次比Type/token ratio类符 /形符比、类 /形比、型次比Underuse少用、使用不足Unicode通用码Unicodify按通用码编码、转换为通用码Unit of meaning意义单位WaC/Web as Corpus网络语料库、网库Wildcard通配符Word alignment词对齐、词级对齐Word form词形Word family词族Word list词表Word sketch词语素描WSD/Word-sense disambiguation词义消歧XML/Extensible Markup Language可扩展标记语言Zipf ’ s Law/Zipfian Law齐夫定律Z score Z 值常用语料库ACE Australian Corpus of EnglishANC American National CorpusARCHER A Representative Corpus of Historical English Registers BASE British Academic Spoken English CorpusBAWE British Academic Written English CorpusBNC British National CorpusBoE Bank of EnglishBrown Brown CorpusCANCODE Cambridge and Nottingham Corpus of Discourse in English CEC China English CorpusCEM Corpus for English MajorsCHILDES Child Language Data Exchange SystemCIC Cambridge International CorpusCLEC Chinese Learners English CorpusCLOB2009 Brown family corpus of British EnglishCOBUILD Collins Birmingham University International Language Database COCA The Corpus of Contemporary American EnglishCOLSEC College Learners Spoken English CorpusCOLT Bergen Corpus of London Teenage LanguageCrown2009 Brown family corpus of American EnglishFLOB Freiburg-LOB Corpus of British EnglishFROWN Freiburg-Brown Corpus of American EnglishHelsinki Diachronic part of the Helsinki Corpus of English Texts DiachroniccorpusHKCSE Hong Kong Corpus of Spoken EnglishICE International Corpus of EnglishICE-GB International Corpus of English: Great BritainICLE International Corpus of Learner EnglishJEFLL Japanese EFL Learner CorpusLCMC Lancaster Corpus Mandarin ChineseLINDSEI Louvain International Database of Spoken English Interlanguage LIVAC Linguistic Variations in Chinese Speech CommunitiesLLC London Lund CorpusLOB Lancaster-Oslo/Bergen CorpusLOCNESS Louvain Corpus of Native English EssaysLONGDALE LONGitudinal DAtabase of Learner EnglishMICASE Michigan Corpus of Academic Spoken EnglishMICUSP Michigan Corpus of Upper-level Student PapersNESSIE Native English Speakers ’Similarly and Identically-prompted EssaysPACCEL Parallel Corpus of Chinese EFL LearnersSBCSAE Santa Barbara Corpus of Spoken American EnglishSCCSD The Spoken Chinese Corpus of Situated DiscourseSCORE Singapore Corpus of Research in EducationSEC Spoken English CorpusSECCL Spoken English Corpus of Chinese LearnersSECOPETS Spoken English Corpus of Public English Test SystemSEU Survey of English UsageSWECCL Spoken and Written English Corpus of Chinese Learners WECCL Written English Corpus of Chinese LearnersLast updated 2012-08-08 by许家金。