幾個名詞(robots.txt/POST/Phrase chunking)

NO IMAGE

導讀:最近在走流程的時候遇到一些名詞,之前並沒有接觸過,現在將一部分收集起來以便以後查閱。

1、 robots.txt
robots.txt 是一個純文字檔案,通過在這個檔案中宣告該網站中不想被robots訪問的部分,這樣,該網站的部分或全部內容就可以不被搜尋引擎收錄了,或者指定搜尋引擎只收錄指定的內容。當一個搜尋機器人訪問一個站點時,它會首先檢查該站點根目錄下是否存在robots.txt,如果找到,搜尋機器人就會按照該檔案中的內容來確定訪問的範圍,如果該檔案不存在,那麼搜尋機器人就沿著連結抓取。

It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *

Disallow: /

The “User-agent: *” means this section applies to all robots.

The “Disallow: /” tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don’t want robots to use.

So don’t try to use /robots.txt to hide information.

材料來源:http://www.robotstxt.org/robotstxt.html

2、Part-of-speech tagging

Part-of-speech tagging (POS tagging or POST), also called grammatical tagging, is the process of marking up the words in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e., relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags.

 

材料來源:en.wikipedia.org/wiki/Part-of-speech_tagging

 

3、Phrase chunking

Phrase chunking is a natural language process that separates and segments sentences into its subconstituents, i.e. noun, verb and prepositional phrases.

材料來源:en.wikipedia.org/wiki/Phrase_chunking