Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval

LISTEN
 
Dilara Keküllüoğlu

Onur Varol
Natural Language Processing, Information Retrieval and Extraction

Natural language processing (NLP) is a branch of artificial intelligence (AI) which aims to automate the analysis, generation and acquisition of natural languages. Information retrieval (IR) deals with the analysis and search of large data collections. After retrieving a small relevant subset; information extraction and NLP approaches are applied to extract useful information.

Our research aims to develop state-of-the-art models for retrieving and extracting information from large collections of text. We employ machine learning and deep neural network based approaches to help machines understand these text collections. In our research group, we mainly focus on social media and news content in different languages, but mostly in Turkish and English. For more information regarding our current projects, please visit our lab page.

Website
 
İnanç Arın

Berrin Yanıkoğlu
Large Language Models and Domain-Specific NLP

Large language models (LLMs) have fundamentally transformed how machines process and generate natural language, enabling unprecedented capabilities in text understanding, reasoning, and generation. However, deploying these models in specialized, high-stakes domains (such as social media, healthcare, legal, and regulatory contexts) introduces critical challenges around domain adaptation, data privacy, and decision explainability, particularly for low-resource languages like Turkish.

Our research focuses on developing domain-adapted, privacy-preserving NLP systems that can operate reliably in sensitive, real-world environments. We combine neural and symbolic approaches to build hybrid architectures where LLMs are augmented with knowledge graphs and deterministic rule engines, enabling transparent and auditable AI decisions. Our work spans efficient fine-tuning of large models for specialized domains, automatic de-identification of sensitive texts, and social media language understanding, including hate speech and toxic content detection across multiple languages.