Autocomplete API
API developed by Statistics Portugal for use in the automatic coding process (via autocomplete) of the variables Occupation, Higher Education and Economic Activity.
The API allows any user developing a WEB form to invoke it and, through it, obtain a coding harmonized with Statistics Portugal nomenclatures, instead of developing and using their own codifications and aggregations.
Introduction
The API is based on a REST (Representational State Transfer) access principle, but since only searches are carried out, only the GET method is available.
Access
API root URL
Positioning URL
From the service consumer’s perspective, the segment “dic” (indicating that a dictionary is to be accessed) follows. Finally, the identifier segment of the dictionary intended to be used in the autocomplete.
At this stage, the available dictionaries are:
Use
There are two use cases available:
Prefetch
/prefetch (https://apife.ine.pt/dic/{dictionary_id}/prefetch)
For the identified dictionary, returns a list of the most frequent entries. It can be invoked and cached on the autocomplete client.
Search
?q=XXXX (https://apife.ine.pt/dic/{dictionary_id}/?q={query_text})
https://apife.ine.pt/dic/CPP2010/?q=baila
Structure
Prefetch and search return arrays in JSON with objects having the structure:
[ { c : ”AAA”, d : “BBBB”, t : “CCCCC”}, …]
In each element:
The order of the elements in the array reflects their ordering by relevance (most relevant first).
Dictionaries
The basis for building the Dictionaries (besides the official coding lists CAE Rev3, CPP 2010, CNAEF), lies in the entire history of manual coding of more than 30 statistical operations carried out over about 8 years within the scope of Household Surveys. At the time, the total number of interviews conducted exceeded 600,000. All expressions (1) with frequency equal to or greater than 10 and coding consistency of 90% and (2) with frequency equal to or greater than 5 and coding consistency of 100% were considered eligible to enrich the classifiers. Then, a metric distance was calculated between the expressions already existing in the classifier and the rest of the history. The Optical String Alignment - an extension of the Levenshtein measure - was used to calculate the distance at an interval of 1 to 3. After validation, expressions equivalent in meaning but distinct in spelling were integrated into the Dictionaries.
Figure 1 - Dictionary Creation Schema
Nomenclatures
As mentioned, the API classifies expressions based on three nomenclatures
For the classification of Occupations, the SMI Version used is: V02014 - Portuguese Classification of Professions, CPP 2010, available at: https://smi.ine.pt/Versao/Detalhes/2014?modal=1
For the classification of Economic Activity, the SMI Version used is: V00554 - Portuguese Classification of Economic Activities, Rev. 3, available at: https://smi.ine.pt/Versao/Detalhes/554?modal=1
For the classification of Higher