Intro to NLP 笔记

Reading time ~5 minutes

之前接触的NLP 知识都不系统借着学习《Introduction to Natural Language Processing》 的机会,系统的了解了解。另外可参考的课程包括:

Intro

做语音识别时特别需要 IPA Chart ,这个上面也配有发音。

NLP 的难点,除了理解语法语义,歧义是其另一大难点,而歧义种类分如下几种:

  • Morphological: Joe is quite impossible. Joe is quite important.
  • Phonetic: Joe’s finger got number.
  • Part of speech: Joe won the first round.
  • Syntactic: Call Joe a taxi.
  • Pp attachment: Joe ate pizza with a fork / with meatballs / with Samantha / with pleasure.
  • Sense: Joe took the bar exam.
  • Modality: Joe may win the lottery.
  • Subjectivity: believes that stocks will rise.
  • Negation: likes his pizza with no cheese and tomatoes.
  • Referential: yelled at Mike. He had broken the bike. yelled at Mike. He was angry at him.
  • Reflexive: John bought him a present. John bought himself a present.
  • Ellipsis and parallelism: gave Mike a beer and Jeremy a glass of wine.
  • Metonymy: called and left a message for Joe.

除去上面提到的问题,不标准的语言(俚语、新词等)、语法错误、字词错误、计算机解析、复杂句、幽默讽刺、指代、潜在意思等。等也是NLP 做起来比较困难的地方。

然后介绍了语言学的一些知识( Linguistic Knowledge):

  • Phonetics and phonology - the study of sounds
  • Morphology - the study of word components
  • Syntax - the study of sentence and phrase structure
  • Lexical semantics - the study of the meanings of words
  • Compositional semantics - how to combine words
  • Pragmatics - how to accomplish goals
  • Discourse conventions - how to deal with units larger than utterances

接着介绍了一些语言学的知识,比如PIE 及衍生语簇及子各语支。 还有比较重要的是语音演变规则:

Grimm’sLaw

  • Voiceless stops turn into voiceless fricatives
  • Voiced stops become voiceless stops
  • Voiced aspirated stops change to voiced stops or fricatives Examples

  • Ancient Greek: πούς, Latin: pēs, Sanskrit: pāda – English: foot, German: Fuß, Swedish: fot
  • Ancient Greek: κύων, Latin: canis, Welsh: ci – English: hound, Dutch: hond, German: Hund

接着介绍了世界语言的几个链接:

Mathjax was not loaded successfully

Original post: http://blog.josephjctang.com/2015-10/notes-of-intro-to-NLP/

个人近期时间日志分析

年初总结的时候发现去年相对于往年,读书量是越发的少了。惭悔懊恼之余,不禁分析起自身的因素来。自然是个人时间没有更好地利用起来。自己之前每周都有总结分析,但疏于文字整理,需要改正的方面也就缺乏跟进了。后续也需要渐渐地把行文这个习惯捡起来。结合 RescueTime 记录和 iOS 的 Scree...… Continue reading

问问题的妙用

Published on May 27, 2018

时间管理中的断舍离

Published on May 20, 2018