Rizzo, Stefano Giovanni
  
(2017)
Temporal Dimension of Text: Quantification, Metrics and Features, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. 
 Dottorato di ricerca in 
Computer science and engineering, 29 Ciclo. DOI 10.6092/unibo/amsdottorato/8004.
  
 
  
  
        
        
        
  
  
  
  
  
  
  
    
  
    
      Documenti full-text disponibili:
      
    
  
  
    
      Abstract
      The time dimension is so inherently bound to any information space that it can hardly be ignored when describing the reality, nor can be disregarded in interpreting most information. In the pressing need to search and classify a larger amount of unstructured data with better accuracy, the temporal dimension of text documents is becoming a crucial property for information retrieval and text mining tasks. 
Of all the features that characterize textual information, the time dimension is still not fully regarded, despite its richness and diversity. Temporal information retrieval is still in its infancy, while time features of documents are barely taken into account in text classification.
The temporal aspects of text can be used to better interpret the relative truthiness and the context of old information, and to determine the relevance of a document with respect to information needs and categories. 
In this research, we first explore the temporal dimension of text collections in a large scale study on more than 30 million documents, quantifying its extent and showing its peculiarities and patterns, such as the relation between the creation time of documents and the mentioned time. 
Then we define a comprehensive and accurate representation of the temporal aspects of documents, modeling ad-hoc temporal similarities based on metric distances between time intervals. 
Results of evaluation show taking into account the temporal relevance of documents yields a significant improvement in retrieval effectiveness, over both implicit and explicit time queries, and a gain in classification accuracy when temporal features are involved. 
By defining a set of temporal features to comprehensively describe the temporal scope of text documents, we show their significant relation to topical categories and how these proposed features are able to categorize documents, improving the text categorization tasks in combination with ordinary terms frequencies features.
     
    
      Abstract
      The time dimension is so inherently bound to any information space that it can hardly be ignored when describing the reality, nor can be disregarded in interpreting most information. In the pressing need to search and classify a larger amount of unstructured data with better accuracy, the temporal dimension of text documents is becoming a crucial property for information retrieval and text mining tasks. 
Of all the features that characterize textual information, the time dimension is still not fully regarded, despite its richness and diversity. Temporal information retrieval is still in its infancy, while time features of documents are barely taken into account in text classification.
The temporal aspects of text can be used to better interpret the relative truthiness and the context of old information, and to determine the relevance of a document with respect to information needs and categories. 
In this research, we first explore the temporal dimension of text collections in a large scale study on more than 30 million documents, quantifying its extent and showing its peculiarities and patterns, such as the relation between the creation time of documents and the mentioned time. 
Then we define a comprehensive and accurate representation of the temporal aspects of documents, modeling ad-hoc temporal similarities based on metric distances between time intervals. 
Results of evaluation show taking into account the temporal relevance of documents yields a significant improvement in retrieval effectiveness, over both implicit and explicit time queries, and a gain in classification accuracy when temporal features are involved. 
By defining a set of temporal features to comprehensively describe the temporal scope of text documents, we show their significant relation to topical categories and how these proposed features are able to categorize documents, improving the text categorization tasks in combination with ordinary terms frequencies features.
     
  
  
    
    
      Tipologia del documento
      Tesi di dottorato
      
      
      
      
        
      
        
          Autore
          Rizzo, Stefano Giovanni
          
        
      
        
          Supervisore
          
          
        
      
        
      
        
          Dottorato di ricerca
          
          
        
      
        
      
        
          Ciclo
          29
          
        
      
        
          Coordinatore
          
          
        
      
        
          Settore disciplinare
          
          
        
      
        
          Settore concorsuale
          
          
        
      
        
          Parole chiave
          Time, dimension, temporal expressions, timex, information retrieval, text categorization, features engineering, machine learning, new york times, wikipedia, metric distances, time intervals, time quantification, temporal queries, content-level time, relative time
          
        
      
        
          URN:NBN
          
          
        
      
        
          DOI
          10.6092/unibo/amsdottorato/8004
          
        
      
        
          Data di discussione
          15 Maggio 2017
          
        
      
      URI
      
      
     
   
  
    Altri metadati
    
      Tipologia del documento
      Tesi di dottorato
      
      
      
      
        
      
        
          Autore
          Rizzo, Stefano Giovanni
          
        
      
        
          Supervisore
          
          
        
      
        
      
        
          Dottorato di ricerca
          
          
        
      
        
      
        
          Ciclo
          29
          
        
      
        
          Coordinatore
          
          
        
      
        
          Settore disciplinare
          
          
        
      
        
          Settore concorsuale
          
          
        
      
        
          Parole chiave
          Time, dimension, temporal expressions, timex, information retrieval, text categorization, features engineering, machine learning, new york times, wikipedia, metric distances, time intervals, time quantification, temporal queries, content-level time, relative time
          
        
      
        
          URN:NBN
          
          
        
      
        
          DOI
          10.6092/unibo/amsdottorato/8004
          
        
      
        
          Data di discussione
          15 Maggio 2017
          
        
      
      URI
      
      
     
   
  
  
  
  
  
    
    Statistica sui download
    
    
  
  
    
      Gestione del documento: 
      
        