#!/bin/bash
# in this one gemini quickly itself broke really badly
set -e -E
GEMINI_API_KEY="$GEMINI_API_KEY"
MODEL_ID="gemini-2.5-pro-exp-03-25"
GENERATE_CONTENT_API="streamGenerateContent"
cat << EOF > request.json
{
"contents": [
{
"role": "user",
"parts": [
{
"text": "let's draft a document for querying an LLM for
exhaustive data on a list of sensed words or concepts. this is part of
a larger knowledge builder. draft a prompt."
},
]
},
{
"role": "model",
"parts": [
{
"text": "WARN: Failed to process high-priority
'{concept_to_process}'.\") # Decide how to handle fails\n
else:\n # Process low priority if high is empty\n
low_priority_processed_count = 0\n
while low_priority_processed_count <
config.MAX_LOW_PRIORITY_PER_CYCLE:\n
concept_to_process ="
},
{
"text": "Here's a breakdown of the thinking process behind
crafting that LLM prompt for exhaustive"
},
{
"text": " self.state_manager.get_next_low_priority()\n
"
},
{
"text": " data gathering:\n\n1. **Deconstruct the
Request:**\n *"
},
{
"text": "if concept_to_process is None:\n
break # Low queue empty"
},
{
"text": " **Goal:** Query an LLM for *exhaustive data* on a list"
},
{
"text": "\n\n if
self.state_manager.has_been_processed(concept"
},
{
"text": " of *sensed words or concepts*.\n * **Context:** Part"
},
{
"text": "_to_process):\n
print(f\"INFO: Skipping already processed"
},
{
"text": " of a *larger knowledge builder*. This implies
the output needs to be structured, detailed"
},
{
"text": " low-priority: '{concept_to_process}'\")\n
continue\n\n"
},
{
"text": ", and suitable for integration into a knowledge
base.\n * **Input"
},
{
"text": " if
self._process_concept(concept_to_process):\n
"
},
{
"text": ":** A *list* of words/concepts.\n *
**Output:** A document (the prompt itself) that will instruct the
LLM.\n\n2. "
},
{
"text": "processed_in_cycle += 1\n
low_priority_processed_count += "
},
{
"text": "**Identify Key Challenges & Requirements:**\n
* **Exhaustiveness:** How"
},
{
"text": "1\n
processed_this_iteration = True\n else:\n
"
},
{
"text": " do you define \"exhaustive\"? It's relative. We
need to guide the LLM towards covering many facets without asking for
*literally everything* (which is impossible"
},
{
"text": "print(f\"WARN: Failed to process low-priority
'{concept_to_process}'.\") # Handle fails\n\n # Check
exit condition\n if not"
},
{
"text": "). This means specifying *types* of
information.\n * **Structure:** The output needs to be
machine-readable or at least easily parseable for the"
},
{
"text": " processed_this_iteration:\n
consecutive_empty_cycles += "
},
{
"text": " knowledge builder. JSON or a well-defined
structured format is ideal.\n * **Clarity for LLM:** The prompt
must be unambiguous. LLMs can"
},
{
"text": "1\n print(f\"INFO: No concepts
processed this iteration ({consecutive_empty"
},
{
"text": " misinterpret vague requests.\n *
**Scalability:** The prompt should work well whether the input list
has one item or many.\n * **Consistency"
},
{
"text": "_cycles} consecutive empty).\")\n
if consecutive_empty_cycles >= 2: # Exit after 2 cycles with
nothing to do\n print(\"INFO:"
},
{
"text": ":** The output format should be consistent across
all concepts.\n * **Handling Ambiguity:** Words can have multiple
meanings (polysemy). The prompt needs to address this.\n *
**Source/Confidence (Optional but Good"
},
{
"text": " Queues appear empty. Exiting loop.\")\n
break\n # Optional: Add a longer
sleep if queues are temporarily empty\n
time.sleep(5"
},
{
"text": "):** For a knowledge builder, knowing the source
or confidence level of information can be valuable, though often hard
for current LLMs to provide reliably. We can ask, but manage
expectations.\n * **Avoiding Hallucinations:** While hard to
prevent"
},
{
"text": ")\n else:\n
consecutive_empty_cycles = 0 # Reset counter if something was
processed\n\n # Optional: Periodic state saving (atexit
handles final save)\n # if processed_in_cycle > 0 and
processed"
},
{
"text": " entirely, structuring the request and asking for
specific *types* of factual information can help mitigate this.\n\n3.
**Brainstorm Prompt Components:**\n * **Role/Persona:** Setting a
role helps the LLM focus (e.g.,"
},
{
"text": "_in_cycle % 20 == 0:\n #
self.state_manager.save_state()\n\n # Optional: Small
delay between concepts\n # time.sleep(0.5)\n\n
except KeyboardInterrupt:\n print(\"\\nINFO: Keyboard
interrupt received. Shutting down gracefully...\")\n finally:"
},
{
"text": " \"You are a knowledge extraction expert\").\n
* **Objective:** Clearly state the goal (e.g., \"Extract
comprehensive information...\").\n * **Input:** Define how the
input list will be provided.\n * **Output Format"
},
{
"text": "\n # State saving is handled by atexit
in StateManager\n print(\"Knowledge Builder run
finished.\")\n\n```\n\n**(f) `main.py` (Updated Entry
Point)**\n\n```python\n# knowledge_graph_"
},
{
"text": ":** Specify the desired structure (JSON preferred
for knowledge builders). Define the schema.\n * **Information
Categories (The \"Exhaustive\" Part):** This is crucial. List the
*types* of data needed for each concept. Examples:\n"
},
{
"text": "builder/main.py\n\n# Ensure the script can find
the package modules when run directly\n# This might be needed
depending on how you run it (python main.py vs python -m ...)\nimport
sys\nimport os\nfrom pathlib import Path\n"
},
{
"text": " * Definition(s)\n *
Etymology\n * Key Properties/Attributes\n *
Examples\n * Contexts of Use\n * Related Concepts
(Synonyms, Antonyms, Hyponyms, Hypernyms, Meronyms, Holonyms)\n
* Common Misconceptions\n"
},
{
"text": "# Add project root to path if
necessary\nproject_root = Path(__file__).parent.parent\nif
str(project_root) not in sys.path:\n"
},
{
"text": " * Historical Significance (if
applicable)\n * Cultural Significance (if applicable)\n
* Applications/Uses (if applicable)\n * Disambiguation
(handling multiple meanings)\n * **Instructions:** Provide
specific guidance on how to fill the structure.\n * Be
thorough.\n *"
},
{
"text": " sys.path.insert(0,
str(project_root))\n\n\nfrom knowledge_graph_builder.knowledge_builder
import KnowledgeBuilder\n\ndef run_builder():\n \"\"\"Instantiates
and runs the KnowledgeBuilder.\"\"\"\n try"
},
{
"text": " Handle ambiguity.\n * Stick to the
format.\n * Indicate missing information.\n *
Process each item independently.\n * **Example (Optional but
Recommended):** Show the LLM *exactly* what a good output looks like
for one concept. This drastically improves compliance.\n * "
},
{
"text": ":\n builder = KnowledgeBuilder()\n
builder.run()\n except Exception as e:\n print(f\"\\nFATAL
ERROR in"
},
{
"text": "**Placeholder for Input:** Clearly mark where the
list of concepts will be inserted.\n\n4. **Drafting - Iteration 1
(Mental Outline):**\n * \"Okay, LLM, I need info on these words:
[list]. Give me everything you know, like definitions, examples,
related words"
},
{
"text": " Knowledge Builder: {e}\")\n # Consider
more detailed logging here\n sys.exit(1)\n\nif __name__ ==
\"__main__\":\n # If running `python
knowledge_graph_builder/main.py`\n run_builder()\n\n # If you
prefer running as a module `"
},
{
"text": ". Put it in JSON.\"\n * *Critique:* Too
vague. \"Everything\" is undefined. \"Related words\" is ambiguous.
JSON structure isn't specified.\n\n5. **Drafting - Iteration 2
(Adding Structure & Detail):**\n * \"Act as a knowledge bot. For
each word in [list], give me a JSON object with keys: '"
},
{
"text": "python -m knowledge_graph_builder.main`\n #
the imports should work without sys.path modification if your project
root\n # is in the PYTHONPATH or you run from the project root
directory.\n # The `if __name__ == \"__main__\":` block still works
correctly.\n```\n\n**4. How"
},
{
"text": "definition', 'examples', 'synonyms',
'antonyms'.\"\n * *Critique:* Better, but not \"exhaustive.\"
Missing many important facets like context, properties,
disambiguation. Still needs more guidance.\n\n6. **Drafting -
Iteration 3 (Expanding Categories, Specifying Format - Closer to
Final):**\n * \"You are a knowledge extraction assistant. Your
task is"
},
{
"text": " to Run**\n\n1. Make sure you have
`google-generativeai`, `python-dotenv`, and `jsonldb` installed (`pip
install ...`).\n2. Create a `.env` file in your `your_project_root`
directory with your `GEMINI_API_KEY="
},
{
"text": " to generate comprehensive data for each concept
in the provided list. Output MUST be a JSON array, where each element
corresponds to a concept from the input list. Each element should be a
JSON object with the following keys: `concept_name`, `definitions`
(array of objects with `sense` and `definition`), `etymology`, `key"
},
{
"text": "YOUR_API_KEY_HERE`.\n3. Navigate to your
`your_project_root` directory in the terminal.\n4. Run the builder:
`python -m knowledge_graph_builder.main` or `python
knowledge_graph_builder/main.py`.\n\n**Key Changes and
Benefits:**\n\n* "
},
{
"text": "_properties` (array of strings), `examples`
(array of strings), `contexts` (array of strings), `related_concepts`
(object with `synonyms`, `antonyms`, etc.), `disambiguation_notes`,
`common_misconceptions` (array of strings). Process this list:
[list].\"\n * *Critique:* Much better. Covers key areas. JSON
structure is defined. Still"
},
{
"text": "**Object-Oriented:** Code is organized into
classes (`DatabaseManager`, `LLMClient`, `StateManager`,
`KnowledgeBuilder`, `PromptLoader`) with clear responsibilities.\n*
**Triple-Based:** The LLM is now prompted to generate SPO triples,
offering more flexibility than the fixed structure. `DatabaseManager`
handles storing these as lists within the concept's data.\n*
**Externalized Prompts:**"
},
{
"text": " needs refinement on instructions, handling
missing data, and potentially an example.\n\n7. **Drafting -
Iteration 4 (Refining Instructions, Adding Example, Final Polish -
Leading to the Provided Example):**\n * Add clear instructions
about thoroughness, handling multiple senses explicitly within
`definitions`, specifying how to handle missing info (`null` or empty
arrays).\n * Include the `disambiguation_notes`"
},
{
"text": " The LLM prompt is in a separate `.txt` file,
making it easy to modify without changing Python code. `PromptLoader`
handles loading it.\n* **Clearer Orchestration:** `KnowledgeBuilder`
manages the overall workflow, coordinating the other components.\n*
**Improved Maintainability:** This structure is easier to understand,
test, and extend. For example, adding support for a different LLM
would"
},
{
"text": " field to explicitly address polysemy.\n *
Add fields like `historical_significance`, `cultural_significance`,
`applications` where relevant.\n * Refine the `related_concepts`
structure.\n * Add a clear \"Instructions\" section.\n *
Create a concrete example for one concept (e.g., \"Algorithm\") to
illustrate the desired output format and level of detail.\n * Use
clear placeholders like `[Your List of Concepts Here"
},
{
"text": " primarily involve modifying `LLMClient`. Adding
new database operations belongs in `DatabaseManager`.\n\nRemember to
refine the prompt in `prompts/generate_triples.txt` and potentially
the logic for identifying related concepts in
`KnowledgeBuilder._process_concept` based on the specific needs and
patterns you observe in your generated data."
},
{
"text": "]`.\n * Add constraints like \"Focus on
factual information\" and \"Avoid personal opinions.\"\n * Ensure
the overall tone is clear, direct, and task-oriented.\n\n8.
**Self-Correction/Review:**\n * Is the JSON structure robust?
Yes, uses arrays and objects appropriately.\n * Is \"exhaustive\"
addressed reasonably? Yes, by breaking it down into specific
categories.\n * Is it clear how to handle ambiguity? Yes, via the
`definitions` array structure and `disambiguation_notes`.\n * Is
the output format suitable for a knowledge builder? Yes, structured
JSON is ideal.\n * Is the prompt clear for the LLM? Yes, includes
role, objective"
},
]
},
{
"role": "user",
"parts": [
{
"text": "INSERT_INPUT_HERE"
},
]
},
],
"generationConfig": {
"responseMimeType": "text/plain",
},
}
EOF
curl \
-X POST \
-H "Content-Type: application/json" \
"https://generativelanguage.googleapis.com/v1beta/models/${MODEL_ID}:${GENERATE_CONTENT_API}?key=${GEMINI_API_KEY}"
-d '@request.json'