• Artículo
      Icon

      A clustering approach to extract data from HTML tables 

      Jiménez Aguirre, Patricia; Roldán Salvador, Juan Carlos; Corchuelo Gil, Rafael (Elsevier, 2021)
      HTML tables have become pervasive on the Web. Extracting their data automatically is difficult because finding the ...
    • Artículo
      Icon

      A coral-reef approach to extract information from HTML tables 

      Jiménez Aguirre, Patricia; Roldán Salvador, Juan Carlos; Corchuelo Gil, Rafael (Elsevier, 2022)
      his article presents Coraline, which is a new table-understanding proposal. Its novelty lies in a coral-reef optimisation ...
    • Artículo
      Icon

      A hybrid quantum approach to leveraging data from HTML tables 

      Jiménez Aguirre, Patricia; Roldán Salvador, Juan Carlos; Corchuelo Gil, Rafael (Springer, 2022)
      The Web provides many data that are encoded using HTML tables. This facilitates rendering them, but obfuscates their ...
    • Artículo
      Icon

      An approach for discovering keywords from Spanish tweets using Wikipedia 

      Ayala Hernández, Daniel; Roldán Salvador, Juan Carlos; Ruiz Cortés, David; Ortega Gallego, Fernando (Universidad de Salamanca, 2015)
      Most approaches to keywords discovery when analyzing microblogging messages (among them those from Twitter) are based on ...
    • Tesis Doctoral
      Icon

      Enterprise Data Integration: On Extracting Data from HTML Tables 

      Roldán Salvador, Juan Carlos (2020-12-22)
      The Web is a universal communication channel that provides a vast amount of valuable data about a plethora of topics. In ...
    • Ponencia
      Icon

      Extracting Web Information using Representation Patterns 

      Roldán Salvador, Juan Carlos; Jiménez Aguirre, Patricia; Corchuelo Gil, Rafael (Association for Computing Machinery (ACM), 2017)
      Feeding decision support systems with Web information typically requires sifting through an unwieldy amount of information ...
    • Artículo
      Icon

      On exploring data lakes by finding compact, isolated clusters 

      Jiménez Aguirre, Patricia; Roldán Salvador, Juan Carlos; Corchuelo Gil, Rafael (Elsevier, 2022)
      Data engineers are very interested in data lake technologies due to the incredible abun dance of datasets. They typically ...
    • Artículo
      Icon

      On Extracting Data from Tables that are Encoded using HTML 

      Roldán Salvador, Juan Carlos; Jiménez Aguirre, Patricia; Corchuelo Gil, Rafael (Elsevier, 2020)
      Tables are a common means to display data in human-friendly formats. Many authors have worked on proposals to extract ...
    • Artículo
      Icon

      On the synthesis of metadata tags for HTML files 

      Jiménez Aguirre, Patricia; Roldán Salvador, Juan Carlos; Gallego, Fernando O.; Corchuelo Gil, Rafael (Wiley, 2020)
      RDFa, JSON-LD, Microdata, and Microformats allow to endow the data in HTML files with metadata tags that help software ...
    • Artículo
      Icon

      TOMATE: A heuristic-based approach to extract data from HTML tables 

      Roldán Salvador, Juan Carlos; Jiménez Aguirre, Patricia; Szekely, Pedro; Corchuelo Gil, Rafael (Elsevier, 2021)
      Extracting data from user-friendly HTML tables is difficult because of their different lay outs, formats, and encoding ...