GitHub - CurryTang/Towards-graph-foundation-models
Despite the success of foundation models in NLP, there seems no clear breakthrough for foundation models on graphs. Why foundation models for graphs is important to research?
- Being able to generalize to unseen domains without any human supervision
- Being able to handle multiple datasets and multiple tasks in a unified way
- (this is more about reasoning, which we do not put too much emphasis) complex reasoning on graph-structured data (doing algorithmic reasoning with a better complexity than traditional algorithms)
- I’m hired to do graph-related research 👀
In this blog post, I briefly share some recent readings about the exploration of the foundation model for graphs.
We summarize these attempts into several pipelines. For each pipeline, we ask the following question:
- Can this model (inductively) generalize to unseen nodes/graphs in the
zero-shot/few-shot manner?
- Are there any benefits from fine-tuning this model on a specific dataset? Or, can this pipeline generalize knowledge across different datasets?
- How does this pipeline compare to models trained from scratch on each target dataset?
- Do more graphs in the pre-training mix correspond to better performance?
- Can this pipeline handle various tasks (like node classification, link prediction, and graph classification) in a unified manner?
Pipeline 1: Graph transformer
Transformers are the backbone of foundation models in NLP, and their natural extension graph transformers are candidates for foundation models for graphs. Compared to GNNs with strong structural inductive biases, transformers do not incorporate any inductive biases (if no positional encoding). On one hand, transformers may achieve better generalizability and scaling ability. On the other hand, it’s much harder to train transformers with limited labeled data. As a result, designing graph transformers is like finding a balanced point for inductive biases in the range of pure self-attentions and GNNs.
For a more detailed list of GT-related papers, you may check awesome graph transformers
- Can this model (inductively) generalize to unseen nodes/graphs in the
zero-shot/few-shot manner? Not yet
- Are there any benefits from fine-tuning this model on a specific dataset? Or, can this pipeline generalize knowledge across different datasets? Yes, proven in some graph-level molecule datasets
- How does this pipeline compare to models trained from scratch on each target dataset (like vanilla GNNs)? Better at the graph level, however, not able to handle node-level tasks, especially the semi-supervised setting.