Statistics Portugal - Web Portal

Autocomplete API

API developed by Statistics Portugal for use in the automatic coding process (via autocomplete) of the variables Occupation, Higher Education and Economic Activity.

The API allows any user developing a WEB form to invoke it and, through it, obtain a coding harmonized with Statistics Portugal nomenclatures, instead of developing and using their own codifications and aggregations.

Introduction

The autocomplete API is a Backend service that returns a list of the most likely suggestions to complete an initial input.

The API is based on a REST (Representational State Transfer) access principle, but since only searches are carried out, only the GET method is available.

Access

API root URL

All API calls start with the url: https://apife.ine.pt

Positioning URL

From the service consumer’s perspective, the segment “dic” (indicating that a dictionary is to be accessed) follows. Finally, the identifier segment of the dictionary intended to be used in the autocomplete.

At this stage, the available dictionaries are:

Occupation: https://apife.ine.pt/dic/CPP2010
Economic Activity: https://apife.ine.pt/dic/CAEREV3
Higher Education: https://apife.ine.pt/dic/CURSOSUPC2021

Use

There are two use cases available:

Prefetch

/prefetch (https://apife.ine.pt/dic/{dictionary_id}/prefetch)

For the identified dictionary, returns a list of the most frequent entries. It can be invoked and cached on the autocomplete client.

Search

?q=XXXX (https://apife.ine.pt/dic/{dictionary_id}/?q={query_text})

https://apife.ine.pt/dic/CPP2010/?q=baila

Structure

Prefetch and search return arrays in JSON with objects having the structure:

[ { c : ”AAA”, d : “BBBB”, t : “CCCCC”}, …]

In each element:

“c” contains the code;
“d” the designation to be presented as a suggestion;
“t” a string of words separated by spaces called tokens.

The order of the elements in the array reflects their ordering by relevance (most relevant first).

Dictionaries

The basis for building the Dictionaries (besides the official coding lists CAE Rev3, CPP 2010, CNAEF), lies in the entire history of manual coding of more than 30 statistical operations carried out over about 8 years within the scope of Household Surveys. At the time, the total number of interviews conducted exceeded 600,000. All expressions (1) with frequency equal to or greater than 10 and coding consistency of 90% and (2) with frequency equal to or greater than 5 and coding consistency of 100% were considered eligible to enrich the classifiers. Then, a metric distance was calculated between the expressions already existing in the classifier and the rest of the history. The Optical String Alignment - an extension of the Levenshtein measure - was used to calculate the distance at an interval of 1 to 3. After validation, expressions equivalent in meaning but distinct in spelling were integrated into the Dictionaries.

Figure 1 - Dictionary Creation Schema

Nomenclatures

As mentioned, the API classifies expressions based on three nomenclatures

Occupation: https://apife.ine.pt/dic/CPP2010
Economic Activity: https://apife.ine.pt/dic/CAEREV3
Higher Education: https://apife.ine.pt/dic/CURSOSUPC2021

For the classification of Occupations, the SMI Version used is: V02014 - Portuguese Classification of Professions, CPP 2010, available at: https://smi.ine.pt/Versao/Detalhes/2014?modal=1

For the classification of Economic Activity, the SMI Version used is: V00554 - Portuguese Classification of Economic Activities, Rev. 3, available at: https://smi.ine.pt/Versao/Detalhes/554?modal=1

For the classification of Higher