Derive structured data using LLMGraphTransformer #

Langchains LLMGraphTransformer is a specialized tool, often found within AI frameworks, designed to convert unstructured text into a structured knowledge graph. Its core function is to use a large language model to systematically identify entities, relationships, and their properties from a document. It’s important to note that this technology is still in an experimental state and requires careful configuration to achieve precise and reliable outputs.

Using LLMGraphTransformer with default prompt #

When using LLMGraphTransformer, you can either rely on its default prompt or provide a custom one to guide the LLM’s output. The default prompt is designed for a general-purpose extraction of nodes and relationships from unstructured text, which is great for a quick start. However, using a custom prompt gives you fine-grained control over the extraction process, allowing you to specify exactly what types of entities, relationships, and properties you want the model to identify. This is especially useful for domain-specific tasks where you need the LLM to adhere to a predefined schema, ensuring the resulting knowledge graph is both accurate and structured for your specific needs.

Setup: #

LLMs	Gemini v1.5 Flash, OpenAI GPT-4o
Prompt	Default/Custom
Text	REGULATION (EU) 2022/2554 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL (DORA): - CHAPTER II - ICT risk management, Section I, Article 5, 2nd paragraph

For a simple visualization of the resulting graph I have used PyVis.

Objective: #

Provide a visual (graph) representation of a text for better and faster understanding.
Explore cpabilities and limitations of LLMGraphTransformer
Assess accurancy

Test Cases #

Test Case 01: Default prompt, no allowed_nodes/allowed_relationships #

Let´s feed in a first step LLMGraphTransformer with the text only.

Graph #

Click on below image to view a larger and interactive version of the graph in a new tab. See the json file as provided by LLMGraphTransformer here

Click here to view a larger version in a new tab

Click to open interactive Graph in new tab

Analysis: #

Verifying the extraction of entities #

From the original text I have (manually) extracted the following 67 key terms (entities) I expected to find in the graph:

Fold out to see manually extracted key terms

Article 11(1)
Article 11(3)
Article 13(6)
Article 6(1)
Article 6(8)
Article 6(8), point (b)
arrangements
authenticity
availability
budget
business continuity policy
changes
communication
confidentiality
cooperation
coordination
corporate level
corrective measure
critical or important functions
data
digital
digital operational resilience needs
digital operational resilience strategy
digital operational resilience training
effective communication
financial entity
functions
governance arrangements
ICT
ICT audits
ICT business continuity policy
ICT related functions
ICT response plans
ICT recovery plans
ICT risk
ICT risk management framework
ICT security awareness programmes
ICT services
ICT skills
ICT third-party service providers
implementation
impact
incidents
integrity
internal audit plans
management body
material changes
material modifications
operational resilience strategy
policy
policy on arrangements regarding the use of ICT services
recovery measure
reporting channels
resources
response and recovery plan
response measure
responsibility
responsibility for managing the financial entity’s ICT risk
risk analysis summary
risk tolerance level
roles
staff
communication
cooperation
coordination
responsibility

LLMGraphTransformer identified (depending on the LLM used) somewhere between 25 and 30 entities, so less than half.

Test Case 02: Default prompt with allowed_nodes, without allowed_relationships #

Now that we have a list of manually identified entities, we can pass it to the LLMGraphTransformer and get the following graph:

Click to open interactive Graph in new tab

Summary #

Above example shows the importance of Named Entity Recognition (NER) in Document Analysis. If we are aiming to get a full picture of the entities and their relations with each other, we require a proper ontology.
Let´s explore options on how to efficiently build this ontology, especially in a specialized or technical domain.