>> Back to NSBS Home


Case Study 2: Data Extraction of Spanish Yellow Pages
 
A company wants to be able to extract Yellow Page search results from Elmundo.es Yellow Pages and save it to their database. This will allow them to quickly search the database for information pertaining to any company. The search keywords are read from a user generated Excel file.
 

Searching is accomplished by specifying any combination of the following search criteria fields: "Actividad", "Empresa", "Provinca", "Localidad", "Cod. Postal".

The search results contain the company name, description, URL of it's website and a hyperlink to additional information about the company.
For each search result, a "Ver Ficha" hyperlink exists that can vary depending on the information available for the company. It may contain only a link to the company's website or show "Información de la empresa" and "Direcciones" in a new browser window when clicked. The difficult task lies in being able to determine both scenarios and be able to extract the details if the second scenario is present.
Newbie Scripts were developed to handle this operation. The First script extracts all search results into an Excel file. The second script reads the Excel file, extracts the detailed information and saves them to an Access Database.
 
Copyright © 2007 Newbie. All Rights Reserved • ©2007 Microsoft Corporation. Windows Vista is a registered trademark of Microsoft Corporation.