LLMGraphTransformer

Derive structured data using LLMGraphTransformer #

Langchains LLMGraphTransformer is a specialized tool, often found within AI frameworks, designed to convert unstructured text into a structured knowledge graph. Its core function is to use a large language model to systematically identify entities, relationships, and their properties from a document. It’s important to note that this technology is still in an experimental state and requires careful configuration to achieve precise and reliable outputs.

Using LLMGraphTransformer with default prompt #

When using LLMGraphTransformer, you can either rely on its default prompt or provide a custom one to guide the LLM’s output. The default prompt is designed for a general-purpose extraction of nodes and relationships from unstructured text, which is great for a quick start. However, using a custom prompt gives you fine-grained control over the extraction process, allowing you to specify exactly what types of entities, relationships, and properties you want the model to identify. This is especially useful for domain-specific tasks where you need the LLM to adhere to a predefined schema, ensuring the resulting knowledge graph is both accurate and structured for your specific needs.

Setup: #

LLMs Gemini v1.5 Flash, OpenAI GPT-4o
Prompt Default/Custom
Text REGULATION (EU) 2022/2554 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL (DORA): - CHAPTER II - ICT risk management, Section I, Article 5, 2nd paragraph

For a simple visualization of the resulting graph I have used PyVis.

Objective: #

  • Provide a visual (graph) representation of a text for better and faster understanding.
  • Explore cpabilities and limitations of LLMGraphTransformer
  • Assess accurancy

Test Cases #

Test Case 01: Default prompt, no allowed_nodes/allowed_relationships #

Let´s feed in a first step LLMGraphTransformer with the text only.

Graph #

Click on below image to view a larger and interactive version of the graph in a new tab. See the json file as provided by LLMGraphTransformer here

Click here to view a larger version in a new tab

Click to open interactive Graph in new tab

Analysis: #

Verifying the extraction of entities #

From the original text I have (manually) extracted the following 67 key terms (entities) I expected to find in the graph:

Fold out to see manually extracted key terms
  • Article 11(1)

  • Article 11(3)

  • Article 13(6)

  • Article 6(1)

  • Article 6(8)

  • Article 6(8), point (b)

  • arrangements

  • authenticity

  • availability

  • budget

  • business continuity policy

  • changes

  • communication

  • confidentiality

  • cooperation

  • coordination

  • corporate level

  • corrective measure

  • critical or important functions

  • data

  • digital

  • digital operational resilience needs

  • digital operational resilience strategy

  • digital operational resilience training

  • effective communication

  • financial entity

  • functions

  • governance arrangements

  • ICT

  • ICT audits

  • ICT business continuity policy

  • ICT related functions

  • ICT response plans

  • ICT recovery plans

  • ICT risk

  • ICT risk management framework

  • ICT security awareness programmes

  • ICT services

  • ICT skills

  • ICT third-party service providers

  • implementation

  • impact

  • incidents

  • integrity

  • internal audit plans

  • management body

  • material changes

  • material modifications

  • operational resilience strategy

  • policy

  • policy on arrangements regarding the use of ICT services

  • recovery measure

  • reporting channels

  • resources

  • response and recovery plan

  • response measure

  • responsibility

  • responsibility for managing the financial entity’s ICT risk

  • risk analysis summary

  • risk tolerance level

  • roles

  • staff

  • communication

  • cooperation

  • coordination

  • responsibility

LLMGraphTransformer identified (depending on the LLM used) somewhere between 25 and 30 entities, so less than half.

Test Case 02: Default prompt with allowed_nodes, without allowed_relationships #

Now that we have a list of manually identified entities, we can pass it to the LLMGraphTransformer and get the following graph:

Click here to view a larger version in a new tab

Click to open interactive Graph in new tab

Summary #

Above example shows the importance of Named Entity Recognition (NER) in Document Analysis. If we are aiming to get a full picture of the entities and their relations with each other, we require a proper ontology.
Let´s explore options on how to efficiently build this ontology, especially in a specialized or technical domain.