Building on @mehowbrainz’s idea, I believe we should create an AI tool that benefits everyone in our ecosystem, from developers to meme creators.
I propose a community-driven initiative to gather resources for fine-tuning the ChatGPT 3.5 model. Some potential data sources include:
- Whitepapers
- Medium articles
- Tweets
- Hypercore/ZenonORG forum posts
- Selected TG messages (e.g., from Kaine, George)
- Community updates
- Network codebase
Imagine the potential: we could even craft specialized models, like a Sigli support bot.
Though I’m not an expert, my experience with Langchain and LlamaIndex gives me a unique perspective. Let’s consider future-proofing our data. For instance, a tweet might be structured as:
{
"created_at": "2023-09-11T23:54:40Z",
"type": "tweet",
"text": "This is a tweet.",
"username": "zenon_network",
"media": {
"amount": 1,
"ipfs_url": "ipfs://QmX7a795B24k72s95x8348fX46529d8795864493587f68c"
}
}
While we’re currently focused on text, I believe we should also incorporate multimedia. It’s likely that multimodal capabilities will emerge soon, and we should be ready. We can discuss data optimization later.
Unfortunately, Twitter data might be challenging to gather, but we can streamline the process for other sources once we finalize a data schema.