Skip to main content

Background and Motivation

As European cities increasingly aim to make data-driven policy decisions, there is a growing need to understand latent “data demand” from public documents such as council meeting notes, mobility strategies, urban development plans, and public tenders. The challenge is not only to identify and structure this demand, but also to assess how such demand can realistically be matched with available data supply.

In the long term, the goal is to automate demand-supply matching. However, in the near term, the focus lies on creating insights for market makers, users who facilitate and evaluate data exchange, about:

  • How feasible is it to fulfil specific data demands?
  • Which conditions or barriers might hinder fulfilment,
  • How the reusability of a demand influences its value,
  • How pricing mechanisms could be informed by these factors.

NLP will be explored as a supporting tool for these market makers, with the understanding that the technology is still young and responsibility for automated outcomes remains limited. This thesis project, therefore, positions NLP not as an end goal but as a means to empower market makers by structuring insights and enabling better evaluation of demand-supply dynamics.

Relevance & Contribution

This project combines fundamental AI research with insights from financial markets to strengthen the design of future data economies. By positioning NLP as a supporting tool for market makers, it strikes a balance between technical exploration and real-world feasibility. The expected outcome is both scientifically novel, in combining NLP with financial market-inspired mechanisms for evaluation, and practically impactful, as it lays the foundation for transparent, evidence-based demand-supply matching in urban data ecosystems.

Research Question

How can NLP-based methods be utilised to support market makers in structuring and evaluating data demand from public urban documents, thereby informing future demand-supply matching?

Sub questions:
  • Which NLP techniques (e.g. transformers, information extraction pipelines) can best assist in extracting contextual cues from heterogeneous public sources?
  • How can structured “Data Demand Cards” (context, problem, information need, potential application) be linked to market-making insights (feasibility, barriers, reusability, pricing)?
  • What can be learned from financial markets and their tools to design mechanisms for evaluating and matching data demand and supply?
  • What are the limitations in terms of generalizability across cities, languages, and document types?

Scope of Work

The student will:

1. Literature Review & Framing

  • Investigate NLP approaches for information extraction and classification.
  • Study financial markets as a reference point for designing data market-making mechanisms.

2. Prototype NLP Pipeline

  • Collect and preprocess a representative set of public urban documents.
  • Develop methods for extracting structured demand cards.

3. Market-Making Support Layer

  • Explore methods for evaluating feasibility, barriers, reusability, and pricing of data demands.

4. Evaluation & Validation

  • Assess both technical accuracy and practical relevance for market makers.

Deliverables

  • A research thesis addressing the main and sub-questions.
  • A prototype pipeline that generates structured demand cards and links them to evaluation criteria.
  • An evaluation report including both quantitative model results and qualitative insights from market-maker use cases.
  • Demo scenario using selected DMI city data.
  • Roadmap outlining future steps toward automated demand-supply matching.

Supervision & Resources

  • Supervision by Abykys AI leads and a university-affiliated supervisor.
  • Access to internal project team and relevant document sources (public/open only).
  • Computational resources provided as needed, with awareness that running large LLMs locally is not foreseen within the next two years due to costs and infrastructure constraints.
  • Collaboration with The Floor’s product development team for prototype integration.

Location

Hybrid / Main office Haarlem, support office Amsterdam

(Dutch proficiency not required, though basic familiarity with Dutch documents is beneficial.)

Timeline (6 months)

Month 1: Literature review, taxonomy definition, dataset preparation

Month 2-3: Model prototyping, early validation

Month 4: Model refinement, performance evaluation

Month 5: Prototype integration, dashboard development

Month 6: Write-up, final testing, defense