A comprehensive tool for text categorization and text summarization in bioinformatics

Md Mustofa Kamal, Kazi Zakia Sultana

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

1 Citation (Scopus)

Abstract

The work focuses on the integration of text categorization and text summarization tasks based on some existing algorithms. We primarily employ the method for bioinformatics literatures to categorize them in relevant domains of bioinformatics and then get a summarized overview of each of the documents in the domain. For text categorization we have chosen three different and core domains of bioinformatics: Protein-Protein Interaction, Disease-Drug Relevance and Pathway-Process Involvement. The method uses TF-IDF based technology for the categorization task and then after categorization it summarizes the key contents of each document using some existing features. The system plays important role in automatically reducing review spaces for the researchers as they do not need to manually select their relevant texts. It also saves time by providing ranked and significantly relevant lines of the documents. Our method outperforms other existing summarization tools in the sense that it optimizes summarization by first categorizing the documents on the basis of TF-IDF technology and then avoids redundant information by properly ranking the sentences using existing score.

Original languageEnglish
Title of host publicationProceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012
Pages592-597
Number of pages6
DOIs
StatePublished - 1 Dec 2012
Event15th International Conference on Computer and Information Technology, ICCIT 2012 - Chittagong, Bangladesh
Duration: 22 Dec 201224 Dec 2012

Publication series

NameProceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012

Conference

Conference15th International Conference on Computer and Information Technology, ICCIT 2012
CountryBangladesh
CityChittagong
Period22/12/1224/12/12

Fingerprint

Bioinformatics
Proteins

Keywords

  • Pathway
  • SumBasic score
  • Text Categorization
  • Text Summarization
  • TF-IDF

Cite this

Kamal, M. M., & Sultana, K. Z. (2012). A comprehensive tool for text categorization and text summarization in bioinformatics. In Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012 (pp. 592-597). [6509764] (Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012). https://doi.org/10.1109/ICCITechn.2012.6509764
Kamal, Md Mustofa ; Sultana, Kazi Zakia. / A comprehensive tool for text categorization and text summarization in bioinformatics. Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012. 2012. pp. 592-597 (Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012).
@inproceedings{e3b674ea923443f39f84c75ccdafed4e,
title = "A comprehensive tool for text categorization and text summarization in bioinformatics",
abstract = "The work focuses on the integration of text categorization and text summarization tasks based on some existing algorithms. We primarily employ the method for bioinformatics literatures to categorize them in relevant domains of bioinformatics and then get a summarized overview of each of the documents in the domain. For text categorization we have chosen three different and core domains of bioinformatics: Protein-Protein Interaction, Disease-Drug Relevance and Pathway-Process Involvement. The method uses TF-IDF based technology for the categorization task and then after categorization it summarizes the key contents of each document using some existing features. The system plays important role in automatically reducing review spaces for the researchers as they do not need to manually select their relevant texts. It also saves time by providing ranked and significantly relevant lines of the documents. Our method outperforms other existing summarization tools in the sense that it optimizes summarization by first categorizing the documents on the basis of TF-IDF technology and then avoids redundant information by properly ranking the sentences using existing score.",
keywords = "Pathway, SumBasic score, Text Categorization, Text Summarization, TF-IDF",
author = "Kamal, {Md Mustofa} and Sultana, {Kazi Zakia}",
year = "2012",
month = "12",
day = "1",
doi = "10.1109/ICCITechn.2012.6509764",
language = "English",
isbn = "9781467348348",
series = "Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012",
pages = "592--597",
booktitle = "Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012",

}

Kamal, MM & Sultana, KZ 2012, A comprehensive tool for text categorization and text summarization in bioinformatics. in Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012., 6509764, Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012, pp. 592-597, 15th International Conference on Computer and Information Technology, ICCIT 2012, Chittagong, Bangladesh, 22/12/12. https://doi.org/10.1109/ICCITechn.2012.6509764

A comprehensive tool for text categorization and text summarization in bioinformatics. / Kamal, Md Mustofa; Sultana, Kazi Zakia.

Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012. 2012. p. 592-597 6509764 (Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

TY - GEN

T1 - A comprehensive tool for text categorization and text summarization in bioinformatics

AU - Kamal, Md Mustofa

AU - Sultana, Kazi Zakia

PY - 2012/12/1

Y1 - 2012/12/1

N2 - The work focuses on the integration of text categorization and text summarization tasks based on some existing algorithms. We primarily employ the method for bioinformatics literatures to categorize them in relevant domains of bioinformatics and then get a summarized overview of each of the documents in the domain. For text categorization we have chosen three different and core domains of bioinformatics: Protein-Protein Interaction, Disease-Drug Relevance and Pathway-Process Involvement. The method uses TF-IDF based technology for the categorization task and then after categorization it summarizes the key contents of each document using some existing features. The system plays important role in automatically reducing review spaces for the researchers as they do not need to manually select their relevant texts. It also saves time by providing ranked and significantly relevant lines of the documents. Our method outperforms other existing summarization tools in the sense that it optimizes summarization by first categorizing the documents on the basis of TF-IDF technology and then avoids redundant information by properly ranking the sentences using existing score.

AB - The work focuses on the integration of text categorization and text summarization tasks based on some existing algorithms. We primarily employ the method for bioinformatics literatures to categorize them in relevant domains of bioinformatics and then get a summarized overview of each of the documents in the domain. For text categorization we have chosen three different and core domains of bioinformatics: Protein-Protein Interaction, Disease-Drug Relevance and Pathway-Process Involvement. The method uses TF-IDF based technology for the categorization task and then after categorization it summarizes the key contents of each document using some existing features. The system plays important role in automatically reducing review spaces for the researchers as they do not need to manually select their relevant texts. It also saves time by providing ranked and significantly relevant lines of the documents. Our method outperforms other existing summarization tools in the sense that it optimizes summarization by first categorizing the documents on the basis of TF-IDF technology and then avoids redundant information by properly ranking the sentences using existing score.

KW - Pathway

KW - SumBasic score

KW - Text Categorization

KW - Text Summarization

KW - TF-IDF

UR - http://www.scopus.com/inward/record.url?scp=84878101036&partnerID=8YFLogxK

U2 - 10.1109/ICCITechn.2012.6509764

DO - 10.1109/ICCITechn.2012.6509764

M3 - Conference contribution

SN - 9781467348348

T3 - Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012

SP - 592

EP - 597

BT - Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012

ER -

Kamal MM, Sultana KZ. A comprehensive tool for text categorization and text summarization in bioinformatics. In Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012. 2012. p. 592-597. 6509764. (Proceeding of the 15th International Conference on Computer and Information Technology, ICCIT 2012). https://doi.org/10.1109/ICCITechn.2012.6509764