๐ŸŽ ์‹ํ’ˆ์—์„œ AI ๊ณต๋ถ€ํ•˜๊ธฐ

GraphRAG: Neo4j GenAI ํ™œ์šฉํ•˜์—ฌ ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ ๋ณธ๋ฌธ

Food_Health_AI/RAG

GraphRAG: Neo4j GenAI ํ™œ์šฉํ•˜์—ฌ ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ

FoodAI 2025. 4. 10. 12:00

๐Ÿ’ก๋“ค์–ด๊ฐ€๋ฉฐ

์ธ๊ณต์ง€๋Šฅ ๊ธฐ์ˆ ์˜ ๋ฐœ์ „๊ณผ ํ•จ๊ป˜ ๊ฑด๊ฐ• ๋ฐ ์˜์–‘ ์ •๋ณด์˜ ์ ‘๊ทผ์„ฑ์ด ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค Neo4j์™€ ์ƒ์„ฑํ˜• AI๋ฅผ ๊ฒฐํ•ฉํ•œ GraphRAG(Retrieval-Augmented Generation)๋ฅผ ํ†ตํ•ด ์–ด๋–ป๊ฒŒ ๋” ์ •ํ™•ํ•˜๊ณ  ๋งฅ๋ฝ์ด ํ’๋ถ€ํ•œ ์˜์–‘ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ํ•œ๊ตญ์ธ ์˜์–‘์†Œ ์„ญ์ทจ๊ธฐ์ค€ 2020์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•œ ์‚ฌ๋ก€๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ์ด ๊ธฐ์ˆ ์ด ๊ฐ€์ง„ ๊ฐ€๋Šฅ์„ฑ์„ ํƒ์ƒ‰ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


I. Neo4j GenAI์˜ ์ดํ•ด

Neo4j GenAI๋Š” ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ตฌ์ถ•์„ ํ†ตํ•œ ์ƒ์„ฑํ˜• AI ํ™œ์šฉ์„ ๋„์™€์ฃผ๋Š” ํŒŒ์ด์ฌ ํŒจํ‚ค์ง€์ž…๋‹ˆ๋‹ค. ์ด ๋„๊ตฌ๋Š” OpenAI์™€ Neo4j ๋“œ๋ผ์ด๋ฒ„๋ฅผ ์—ฐ๊ฒฐํ•˜์—ฌ ์ง€์‹ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๊ณ ํ’ˆ์งˆ์˜ AI ์‘๋‹ต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

https://neo4j.com/generativeai/

Neo4j GenAI์˜ ์ฃผ์š” ํŠน์ง•

Neo4j GenAI๋Š” ๋ฒกํ„ฐ ๊ฒ€์ƒ‰, ์ง€์‹ ๊ทธ๋ž˜ํ”„, ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์žฅ์ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค:

  • ๋†’์€ ์ •ํ™•๋„์˜ ์‘๋‹ต ์ƒ์„ฑ
  • ํ’๋ถ€ํ•œ ๋งฅ๋ฝ ์ •๋ณด ์ œ๊ณต
  • ๊นŠ์€ ์„ค๋ช… ๊ฐ€๋Šฅ์„ฑ(explainability) ํ™•๋ณด

์ด๋Ÿฌํ•œ ํŠน์ง•๋“ค์€ ์˜์–‘ ์ •๋ณด์™€ ๊ฐ™์ด ์ •ํ™•์„ฑ๊ณผ ๋งฅ๋ฝ์ด ์ค‘์š”ํ•œ ๋ถ„์•ผ์—์„œ ํŠนํžˆ ๊ฐ€์น˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜์–‘์†Œ ๊ฐ„์˜ ๋ณต์žกํ•œ ๊ด€๊ณ„์™€ ๊ฑด๊ฐ• ์ƒํƒœ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ๋กœ ํ‘œํ˜„ํ•จ์œผ๋กœ์จ ๋” ๋ช…ํ™•ํ•œ ์ •๋ณด ์ „๋‹ฌ์ด ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค.

# OpenAI ์—ฐ๊ฒฐ ์˜ˆ์‹œ
import os
os.environ["OPENAI_API_KEY"] = "sk-your-api-key"

# Neo4j ๋“œ๋ผ์ด๋ฒ„ ์—ฐ๊ฒฐ ์˜ˆ์‹œ
from neo4j import GraphDatabase

# Neo4j์— ์—ฐ๊ฒฐ
uri = "bolt://localhost:7687"  # Neo4j Bolt URL
user = "neo4j"                # Neo4j ์‚ฌ์šฉ์ž ์ด๋ฆ„
password = "password"         # ๋น„๋ฐ€๋ฒˆํ˜ธ

driver = GraphDatabase.driver(uri, auth=(user, password))

II. ์˜์–‘ ๋ฐ์ดํ„ฐ์˜ GraphRAG ๊ตฌํ˜„ ๊ณผ์ •

1. ์†Œ์Šค ๋ฌธ์„œ ์ค€๋น„ ๋ฐ ํ…์ŠคํŠธ ์ฒญํฌ ๋ถ„ํ• 

์ด๋ฒˆ ํ”„๋กœ์ ํŠธ์—์„œ๋Š” 'ํ•œ๊ตญ์ธ ์˜์–‘์†Œ ์„ญ์ทจ๊ธฐ์ค€ 2020(๋‹ค๋Ÿ‰์˜์–‘์†Œ)'๋ฅผ ์†Œ์Šค ๋ฌธ์„œ๋กœ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ์ž๋ฃŒ์—๋Š” ์—๋„ˆ์ง€, ํƒ„์ˆ˜ํ™”๋ฌผ, ์‹์ด์„ฌ์œ , ๋‹จ๋ฐฑ์งˆ, ์ง€์งˆ ๋“ฑ ์ฃผ์š” ์˜์–‘์†Œ์— ๋Œ€ํ•œ ๊ถŒ์žฅ ์„ญ์ทจ๋Ÿ‰๊ณผ ๊ด€๋ จ ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

๋ฌธ์„œ ์ฒ˜๋ฆฌ ๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • PDF ํ˜•์‹์˜ ์›๋ณธ ๋ฌธ์„œ๋ฅผ ๋งˆํฌ๋‹ค์šด์œผ๋กœ ๋ณ€ํ™˜
  • ํ…์ŠคํŠธ๋ฅผ '\n\n'์„ ๊ธฐ์ค€์œผ๋กœ ๋ฌธ๋‹จ ๋‹จ์œ„๋กœ ๋ถ„ํ• 
  • ๊ฐ ์ฒญํฌ์˜ ๊ธธ์ด๋ฅผ 2000์ž๋กœ ์ œํ•œํ•˜์—ฌ ๊ด€๋ฆฌ ๊ฐ€๋Šฅํ•œ ํฌ๊ธฐ๋กœ ์กฐ์ •
  • ํ‘œ ํ˜•์‹์˜ ๋ฐ์ดํ„ฐ๋Š” ๋งˆํฌ๋‹ค์šด ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๊ตฌ์กฐํ™”๋œ ์ •๋ณด ๋ณด์กด
์ฒ˜๋ฆฌ ๋‹จ๊ณ„ ์„ค๋ช…
์†Œ์Šค ๋ฌธ์„œ ์ˆ˜์ง‘ ํ•œ๊ตญ์ธ ์˜์–‘์†Œ ์„ญ์ทจ๊ธฐ์ค€ 2020 PDF ํ™•๋ณด
ํ˜•์‹ ๋ณ€ํ™˜ PDF์—์„œ ๋งˆํฌ๋‹ค์šด์œผ๋กœ ๋ณ€ํ™˜
ํ…์ŠคํŠธ ์ฒญํ‚น ๋ฌธ๋‹จ ๊ธฐ์ค€ ๋ถ„ํ•  ๋ฐ ๊ธธ์ด ์ œํ•œ ์ ์šฉ

2. ํ…์ŠคํŠธ ์ฒญํฌ์—์„œ ์—”ํ‹ฐํ‹ฐ ์ธ์Šคํ„ด์Šค ์ถ”์ถœ

๋ถ„ํ• ๋œ ํ…์ŠคํŠธ ์ฒญํฌ์—์„œ ์˜๋ฏธ ์žˆ๋Š” ์ •๋ณด๋ฅผ ๊ตฌ์กฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด LLM(Large Language Model)์„ ํ™œ์šฉํ•˜์—ฌ ์—”ํ‹ฐํ‹ฐ, ๊ด€๊ณ„, ์ฃผ์žฅ(claim)์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ Neo4j์—์„œ ์ œ๊ณตํ•˜๋Š” ํ…œํ”Œ๋ฆฟ์„ ํ•œ๊ตญ์–ด๋กœ ์ˆ˜์ •ํ•˜๊ณ , ์˜์–‘ ๋ถ„์•ผ์— ํŠนํ™”๋œ ์ง€์‹œ์‚ฌํ•ญ์„ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํŠนํžˆ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์˜์–‘ ๊ด€๋ จ ์ถ”๊ฐ€ ๊ณ ๋ ค์‚ฌํ•ญ์„ ๋ฐ˜์˜ํ–ˆ์Šต๋‹ˆ๋‹ค:

https://neo4j.com/docs/neo4j-graphrag-python/current/api.html

  • ์˜์–‘์†Œ์˜ ๋ถ„๋ฅ˜์ฒด๊ณ„(๋‹ค๋Ÿ‰์˜์–‘์†Œ/๋ฏธ๋Ÿ‰์˜์–‘์†Œ ๋“ฑ) ๊ณ ๋ ค
  • ์˜์–‘์†Œ ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ ๊ด€๊ณ„ ํŒŒ์•…
  • ์—ฐ๋ น๋Œ€๋ณ„/์„ฑ๋ณ„ ๊ถŒ์žฅ์„ญ์ทจ๋Ÿ‰ ์ •๋ณด
  • ๊ฑด๊ฐ•์ƒํƒœ์™€์˜ ๊ด€๋ จ์„ฑ
class KoreanNutritionTemplate:
    def __init__(self):
        self.base_prompt = """๋‹น์‹ ์€ ์ง€์‹ ๊ทธ๋ž˜ํ”„ ๊ตฌ์ถ•์„ ์œ„ํ•ด ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ์ตœ๊ณ  ์ˆ˜์ค€์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.

์ž…๋ ฅ๋œ ํ…์ŠคํŠธ๋‚˜ ํ‘œ์—์„œ ๋‹ค์Œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•ด์ฃผ์„ธ์š”:
1. ์—”ํ‹ฐํ‹ฐ(๋…ธ๋“œ) ์ถ”์ถœ ๋ฐ ์œ ํ˜• ์ง€์ •
2. ์—”ํ‹ฐํ‹ฐ ๊ฐ„์˜ ๊ด€๊ณ„ ํŒŒ์•…
3. ๊ฐ ๋…ธ๋“œ๋ณ„ ๊ณ ์œ  ID ํ• ๋‹น
4. ๋…ธ๋“œ์™€ ๊ด€๊ณ„์— ๋Œ€ํ•œ ๊ด€๋ จ ์†์„ฑ ํฌํ•จ

ํ‘œ ํ˜•์‹ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ๋•Œ๋Š” ํŠนํžˆ:
1. ์—ด ์ œ๋ชฉ๊ณผ ๊ฐ’์—์„œ ์ฃผ์š” ์—”ํ‹ฐํ‹ฐ ์‹๋ณ„
2. ํ‘œ ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ด€๊ณ„ ๊ฒฐ์ •
3. ํ‘œ ์ œ๋ชฉ๊ณผ ๋งฅ๋ฝ์„ ๊ณ ๋ คํ•˜์—ฌ ์—”ํ‹ฐํ‹ฐ ์œ ํ˜• ๊ฒฐ์ •

# ์Šคํ‚ค๋งˆ์™€ ์ถœ๋ ฅ ํ˜•์‹
์‚ฌ์šฉํ•  ์Šคํ‚ค๋งˆ: {schema}

๋‹ค์Œ JSON ํ˜•์‹์œผ๋กœ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ด์ฃผ์„ธ์š”:
{{
  "nodes": [{{
    "id": "๋ฌธ์ž์—ด",
    "label": "์—”ํ‹ฐํ‹ฐ์œ ํ˜•",
    "properties": {{}}
  }}],
  "relationships": [{{
    "type": "๊ด€๊ณ„์œ ํ˜•",
    "start_node_id": "๋ฌธ์ž์—ด",
    "end_node_id": "๋ฌธ์ž์—ด",
    "properties": {{}}
  }}]
}}

์ฃผ์˜์‚ฌํ•ญ:
- ๊ณ ์œ ํ•œ ID๋ฅผ ํ• ๋‹นํ•˜๊ณ  ๊ด€๊ณ„ ์ •์˜์— ์žฌ์‚ฌ์šฉ
- ์Šคํ‚ค๋งˆ์— ์ •์˜๋œ ๋…ธ๋“œ์™€ ๊ด€๊ณ„ ์œ ํ˜•๋งŒ ์‚ฌ์šฉ
- ๊ด€๋ จ๋œ ๋ชจ๋“  ์†์„ฑ ํฌํ•จ
- JSON ํ˜•์‹์œผ๋กœ๋งŒ ์ถœ๋ ฅ

์˜ˆ์‹œ: {examples}

์ž…๋ ฅ: {text}"""
    def format(self, schema='', text='', examples=''):
        has_table = bool(re.search(r'(โ™ํ‘œ\s+\d+โ™|<table>)', text, re.MULTILINE))
        
        if has_table:
            prompt = f"{self.base_prompt}\nํ…์ŠคํŠธ์™€ ํ‘œ ๋ชจ๋‘์—์„œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•ด์ฃผ์„ธ์š”."
        else:
            prompt = self.base_prompt
            
        return prompt.format(schema=schema, text=text, examples=examples)
    def add_domain_specific_instructions(self, domain_type='์˜์–‘ํ•™'):
        """๋„๋ฉ”์ธ๋ณ„ ํŠนํ™” ์ง€์‹œ์‚ฌํ•ญ ์ถ”๊ฐ€ ๋ฉ”์„œ๋“œ"""
        if domain_type == '์˜์–‘ํ•™':
            self.base_prompt += """
            
์˜์–‘ํ•™ ๊ด€๋ จ ์ถ”๊ฐ€ ๊ณ ๋ ค์‚ฌํ•ญ:
- ์˜์–‘์†Œ์˜ ๋ถ„๋ฅ˜์ฒด๊ณ„ (๋‹ค๋Ÿ‰์˜์–‘์†Œ/๋ฏธ๋Ÿ‰์˜์–‘์†Œ ๋“ฑ)
- ์˜์–‘์†Œ ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ ๊ด€๊ณ„
- ์—ฐ๋ น๋Œ€๋ณ„/์„ฑ๋ณ„ ๊ถŒ์žฅ์„ญ์ทจ๋Ÿ‰
- ๊ฑด๊ฐ•์ƒํƒœ์™€์˜ ๊ด€๋ จ์„ฑ
            """
def split_by_paragraphs(text, max_chars=2000):
    paragraphs = re.split(r'\n\n(?!<)', text)
    chunks = []
    current = ""
    
    for p in paragraphs:
        if len(current) + len(p) < max_chars:
            current += p + "\n\n"
        else:
            chunks.append(current.strip())
            current = p + "\n\n"
    if current:
        chunks.append(current.strip())
    return chunks

def extract_and_process_tables(md_text):
   chunks = split_by_paragraphs(md_text, max_chars=2000)   
   processed_chunks = []
   for chunk in chunks:
       tables = re.findall(r'<table>.*?</table>', chunk, re.DOTALL)
       processed_chunk = chunk
       
       for table in tables:
           soup = BeautifulSoup(table, 'html.parser')
           df = pd.read_html(StringIO(str(soup)))[0]
           markdown_table = df.to_markdown()
           processed_chunk = processed_chunk.replace(table, markdown_table)
           
       processed_chunks.append(processed_chunk)
       
   return '\n\n'.join(processed_chunks)

def estimate_cost(text):
    input_tokens = len(text) // 4
    expected_output_tokens = 1000 
    
    input_cost = input_tokens * 0.03 / 1000 
    output_cost = expected_output_tokens * 0.06 / 1000 
    total_cost = input_cost + output_cost
    
    return input_tokens, expected_output_tokens, total_cost
def process_er_extraction(text):
    template = KoreanNutritionTemplate()
    llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})
    
    chunks = split_by_paragraphs(text)
    total_cost = sum(estimate_cost(chunk)[2] for chunk in chunks)
    
    if input(f"์˜ˆ์ƒ ๋น„์šฉ: ${total_cost:.2f}, ์ง„ํ–‰? (y/n): ").lower() != 'y':
        return []
    
    responses = []
    for chunk in chunks:
        processed = preprocess_and_process_tables(chunk)
        prompt = template.format(text=processed)
        responses.append(llm.invoke(prompt).content)
    
    return responses

3. Neo4j์— ๋ฐ์ดํ„ฐ ์Šคํ‚ค๋งˆ ํ™•์ธ

์ถ”์ถœ๋œ ์—”ํ‹ฐํ‹ฐ์™€ ๊ด€๊ณ„๋Š” Neo4j ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ์˜์–‘ ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ํ•œ ์ฃผ์š” ๋…ธ๋“œ ํƒ€์ž…๊ณผ ๊ด€๊ณ„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ๋…ธ๋“œ ํƒ€์ž…: ์˜์–‘์†Œ(Nutrient), ์‹ํ’ˆ(Food), ๊ฑด๊ฐ•์ƒํƒœ(HealthCondition), ์—ฐ๋ น๋Œ€(AgeGroup) ๋“ฑ
  • ๊ด€๊ณ„ ํƒ€์ž…: HAS_NUTRIENT(ํ•จ์œ ํ•˜๋‹ค), DECREASES_RISK_OF(์œ„ํ—˜์„ ๊ฐ์†Œ์‹œํ‚จ๋‹ค), BELONGS_TO(์†ํ•œ๋‹ค) ๋“ฑ

์ด๋Ÿฌํ•œ ๊ตฌ์กฐ๋Š” '์–ด๋–ค ์‹ํ’ˆ์ด ์–ด๋–ค ์˜์–‘์†Œ๋ฅผ ํ•จ์œ ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ๊ทธ ์˜์–‘์†Œ๊ฐ€ ์–ด๋–ค ๊ฑด๊ฐ•์ƒํƒœ์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”๊ฐ€'์™€ ๊ฐ™์€ ๋ณต์žกํ•œ ๊ด€๊ณ„๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

4. Text2CypherRetriever ๊ตฌํ˜„

์‚ฌ์šฉ์ž์˜ ์ž์—ฐ์–ด ์งˆ๋ฌธ์„ Neo4j์—์„œ ์‹คํ–‰ํ•  Cypher ์ฟผ๋ฆฌ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ธฐ๋Šฅ์€ GraphRAG์˜ ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์Šคํ‚ค๋งˆ์™€ ํ•จ๊ป˜ few-shot learning์„ ์œ„ํ•œ ์˜ˆ์‹œ๋“ค์„ ํ”„๋กฌํ”„ํŠธ์— ํฌํ•จ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ์ ์ธ ์ž์—ฐ์–ด ์ฟผ๋ฆฌ ๋ณ€ํ™˜ ์‚ฌ๋ก€:

# Text2CypherRetriever ์ดˆ๊ธฐํ™”
retriever = Text2CypherRetriever(
    driver=driver,
    llm=llm,
    neo4j_schema=neo4j_schema,
    examples=examples,
)

# ์‚ฌ์šฉ์ž ์ฟผ๋ฆฌ ์ฒ˜๋ฆฌ
query_text = "20๋Œ€ ์—ฌ์„ฑ์„ ์œ„ํ•œ ์นผ์Š˜์ด ํ’๋ถ€ํ•œ ์‹๋‹จ ์ถ”์ฒœํ•ด์ค˜."
search_result = retriever.search(query_text=query_text)

# ์ƒ์„ฑ๋œ Cypher ์ฟผ๋ฆฌ ํ™•์ธ
print(search_result.metadata['cypher'])

# ๊ฒฐ๊ณผ: MATCH (a:AgeGroup)-[:HAS_INTAKE]->(n:NutrientIntake), 
#      (g:Gender)-[:HAS_AGE_GROUP]->(a), (f:Food)-[:HAS_NUTRIENT]->(n)
#      WHERE a.age_range = '20-29' AND g.gender = 'Female'
#      RETURN f.name LIMIT 10

์ด ๋ฐฉ์‹์„ ํ†ตํ•ด ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์€ ์ž๋™์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ฟผ๋ฆฌ๋กœ ๋ณ€ํ™˜๋˜์–ด ๊ด€๋ จ ๋…ธ๋“œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ณ , ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋” ์ •ํ™•ํ•œ ๋‹ต๋ณ€์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


III. ๊ฒฐ๊ณผ์™€ ํ•œ๊ณ„์ 

๊ฒฐ๊ณผ ํ™•์ธ

์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ตฌํ˜„์—์„œ๋Š” ์˜ํ™” ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ํ…Œ์ŠคํŠธ ์ผ€์ด์Šค์—์„œ ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด "Titanic๊ณผ ๋น„์Šทํ•œ ์žฅ๋ฅด์˜ ์˜ํ™” ์ถ”์ฒœํ•ด์ค˜"๋ผ๋Š” ์ฟผ๋ฆฌ์— ๋Œ€ํ•ด ์žฅ๋ฅด ๊ธฐ๋ฐ˜์˜ ๊ด€๋ จ ์˜ํ™” ๋ชฉ๋ก์„ ์ •ํ™•ํžˆ ๋„์ถœํ–ˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ ํ˜„์žฌ ์˜์–‘ ๋ฐ์ดํ„ฐ ๊ตฌํ˜„์—์„œ๋Š” ๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์˜์–‘ ์ •๋ณด๊ฐ€ ์ตœ์ ์˜ ์Šคํ‚ค๋งˆ๋กœ ์ ์žฌ๋˜์ง€ ์•Š์•„ ์ผ๋ถ€ ์ฟผ๋ฆฌ์—์„œ ๊ธฐ๋Œ€ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์–ป์ง€ ๋ชปํ•˜๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•œ๊ณ„์ ๊ณผ ๊ฐœ์„  ๋ฐฉํ–ฅ

ํ˜„์žฌ ๊ตฌํ˜„์˜ ์ฃผ์š” ํ•œ๊ณ„์ ๊ณผ ํ–ฅํ›„ ๊ฐœ์„  ๋ฐฉํ–ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ์˜์–‘ ๋ฐ์ดํ„ฐ์˜ ๋ณต์žก์„ฑ์„ ๋” ์ž˜ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋Š” ๊ทธ๋ž˜ํ”„ ์Šคํ‚ค๋งˆ ์„ค๊ณ„ ํ•„์š”
  • ํ•œ๊ตญ์–ด ์˜์–‘ ์šฉ์–ด์— ์ตœ์ ํ™”๋œ ์—”ํ‹ฐํ‹ฐ ์ถ”์ถœ ํ…œํ”Œ๋ฆฟ ๊ฐœ์„ 
  • ์˜์–‘์†Œ ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ๊ณผ ๊ฐ™์€ ๋ณต์žกํ•œ ๊ด€๊ณ„๋ฅผ ๋” ํšจ๊ณผ์ ์œผ๋กœ ํ‘œํ˜„ํ•  ๋ฐฉ๋ฒ• ๋ชจ์ƒ‰
  • ๋‹ค์–‘ํ•œ ์‚ฌ์šฉ์ž ์งˆ๋ฌธ ํŒจํ„ด์„ ํฌ๊ด„ํ•  ์ˆ˜ ์žˆ๋Š” few-shot ์˜ˆ์‹œ ํ™•์žฅ

IV. ๊ฒฐ๋ก  ๐ŸŽฏ

GraphRAG๋ฅผ ํ™œ์šฉํ•œ Neo4j GenAI ๊ตฌํ˜„์€ ์˜์–‘ ์ •๋ณด์™€ ๊ฐ™์€ ๋ณต์žกํ•œ ์ง€์‹ ์˜์—ญ์—์„œ ๋งฅ๋ฝ์ด ํ’๋ถ€ํ•˜๊ณ  ์ •ํ™•ํ•œ ์ •๋ณด ์ œ๊ณต์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ํŠนํžˆ ์˜์–‘์†Œ ๊ฐ„์˜ ๊ด€๊ณ„, ์‹ํ’ˆ๊ณผ ๊ฑด๊ฐ• ๊ฐ„์˜ ์—ฐ๊ด€์„ฑ๊ณผ ๊ฐ™์€ ๋ณต์žกํ•œ ์ •๋ณด๋ฅผ ๊ทธ๋ž˜ํ”„ ๊ตฌ์กฐ๋กœ ํ‘œํ˜„ํ•จ์œผ๋กœ์จ ๋‹จ์ˆœํ•œ ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰์ด๋‚˜ ์ผ๋ฐ˜ AI ์‘๋‹ต๋ณด๋‹ค ๋” ์˜๋ฏธ ์žˆ๋Š” ์ •๋ณด๋ฅผ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ˜„์žฌ์˜ ํ•œ๊ณ„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ๊ฐœ์ธํ™”๋œ ์˜์–‘ ๊ถŒ์žฅ์‚ฌํ•ญ, ํŠน์ • ๊ฑด๊ฐ• ์ƒํƒœ์— ๋งž๋Š” ์‹๋‹จ ๊ณ„ํš, ์˜์–‘์†Œ ์ƒํ˜ธ์ž‘์šฉ์— ๋Œ€ํ•œ ์‹ฌ์ธต์  ์ดํ•ด์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์‘์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ์•ž์œผ๋กœ ๋” ์ •๊ตํ•œ ๋ฐ์ดํ„ฐ ๋ชจ๋ธ๋ง๊ณผ ์ตœ์ ํ™”๋œ ์ฟผ๋ฆฌ ๋ณ€ํ™˜์„ ํ†ตํ•ด ์˜์–‘ ๋ฐ ๊ฑด๊ฐ• ๋ถ„์•ผ์—์„œ ๋”์šฑ ๊ฐ€์น˜ ์žˆ๋Š” ๋„๊ตฌ๋กœ ๋ฐœ์ „ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.


์ฐธ๊ณ  ๋ฌธํ—Œ:

  • ํ•œ๊ตญ์ธ ์˜์–‘์†Œ ์„ญ์ทจ๊ธฐ์ค€ 2020, ๋ณด๊ฑด๋ณต์ง€๋ถ€·ํ•œ๊ตญ์˜์–‘ํ•™ํšŒ
  • Neo4j for GenAI ๊ณต์‹ ๋ฌธ์„œ, https://neo4j.com/docs/genai/
  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Lewis et al. (2020)