>> Back to NSBS Home
Case
Study 2: Data Extraction of
Spanish Yellow Pages
A
company wants to be able to extract Yellow Page
search results from
Elmundo.es Yellow Pages and
save it to their database. This will allow them
to quickly search the database for information
pertaining to any company. The search keywords
are read from a user generated Excel file.
|
|
|
 |
 |
Searching is accomplished by specifying
any combination of the following search
criteria fields: "Actividad", "Empresa",
"Provinca", "Localidad", "Cod. Postal".
The search results contain the company
name, description, URL of it's website and
a hyperlink to additional information about
the company. |
|
|
For each search result, a "Ver Ficha"
hyperlink exists that can vary depending
on the information available for the company.
It may contain only a link to the company's
website or show "Información de la empresa"
and "Direcciones" in a new browser window
when clicked. The difficult task lies
in being able to determine both scenarios
and be able to extract the details if
the second scenario is present. |
 |
Newbie Scripts were developed to handle this operation. The First script extracts all search results into
an Excel file. The second script reads the Excel file,
extracts the detailed information and saves them to
an Access Database. |
|
|