One of the difficulties in building an SQL-like query language for the Web is the absence of a database (131) for this huge, heterogeneous repository of information. However, if we are interested in HTML documents only, we can construct a virtual schema from the implicit structure of these files. Thus, at the highest level of (132) , every such document is identified by its Uniform Resource Locator (URL), and a (133) and a text. Also, Web severs provide some additional information such as the type, length, and the last modification date of a document. So for data mining purposes, we can consider the set of all HTML documents as a relation:
Document (url, rifle, text, type, length, modif)
Where all the (134) are character strings. In this framework, an individual document is identified with a (135) in this relation. Of course, if some optional information is missing from the HTML document, the associate fields will be left blank, but this is not uncommon in any database.
A.relation
B.field
C.script
D.tuple
参考答案:D
解析:
[分析]: 为Web建立类似SQL一样的查询语言的困难之一是缺乏一个为这个庞大而异构的信息库的建立的数据库模式。但是,如果我们仅仅关心HTML文档的话,那么我们可以从这些文件的固有结构中构造一个虚拟模式。这样一来,在最高级的抽象层次上,每一个这样的文档都可由它的URL、标题和正文标识。而且,Web服务器提供一些附加的信息,例如类型、长度和文档的最后修改日期等。因此,从数据挖掘的角度来看,我们可以把所有HTML文档组成的集合看作一个关系:
Document (url, title, text, type, length, modif)
其中所有的履性都是字符串类型。在这个框架下,单个文档由关系中的一个重组来标识。当然,如果HTML文档丢失了一些可选信息,则相关的域将为空值,但是这在任何数据库中都是常见的方法。