Abstract

Paper Title/ Authors Name Download View

COMBINE TAG AND VALUE SIMILARITY FOR DATA EXTRACTION AND ALIGNMENT

D.Phani Sri Lakshmi, D.N.S.B.Kavitha


Based on a user’s query Web databases create query result pages. For many applications, such as data integration, which need to cooperate with multiple web databases there is a need to automatically extract the data from these query result pages .So we present a data extraction and alignment method called CTVS which combines both tag and value similarity. The data values from the same attribute are put into the similar column in which CTVS automatically extract data from query result pages by first identifying and segmenting the query result records (QRRs) in query result pages and then align the segmented QRRs into a table. Specially, we advise new techniques to switch the case when the QRRs are not secure, which may be due to the presence of main information, such as a commentary, proposal or advert, and for handling any nested structure that may exist in the QRRs. By CTVS, we create novel record alignment algorithms that align the attributes in a record, in pair wise first and then holistically. Experimental l results show that CTVS achieves high precision and outperforms alive state-of-the-art data extraction methods