


  • linguistic data vs corpus data

  • Forms: signal and symbol

  • Types: written, spoken, multimodal, etc.

  • Static vs Dynamic (on the move).



Linguists often take it for granted that there us such a thing as a sentence because the data they are working with normally shows it, data which is in almost all cases written or otherwise recorded languages, even in the case of transcriptions of spoken language.

Sentences are linear arrangements of words to which a syntactic structure can be assigned and which feature, in most Indo-European languages, a finite verb in their main clause.... The notion .... owing itself to the introduction of writing. Writing presupposes standardisation much more than an oral language because a written text must stand for itself, it must be interpretable even in the absence of the writer. (W.Teubert, 2010. Rethinking Corpus Linguistics)


  • 有語言標記訊息的數位化語言資料庫。素材本身是中性的 (theory-independent),但是標記訊息一定是主觀的。

  • 通常搭配語料搜尋與分析工具。

(這個時代的) 語料庫語言學

  • A way of doing linguistics by looking for structures and patterns in the data; but 需要符應時代精神:Big data, Crowd-sourcing, Hack and Make, Collective Intelligence, Individual computing etc.

  • 巨量資料下的建構與分析方法論。

    • corpus-based/corpus-driven (cf. supervised/unsupervised paradigm in machine learning).

    • Sampling size and Inference

  • 認知神經心理整合

    • A wide range of empirical methods have been employed for investigating the relationship between observable behaviour and underlying mental/neural process.

  • 數位人文、歷史與社會科學整合


Last updated

Was this helpful?