Skip to main content

The MRT framework to generate evolution roadmap for publications.

Project description

mrtframework

NPM Python Style Guide

Demo Web Page | UI Library

Introduction

This is the python code for generating MRT (Master Reading Tree). The output json can be loaded using the React Component react-mrt. You can directly go to the demo page and click the Load Json button to upload the output json as well.

The AMiner system has already integrated this library and can generate MRTs for papers. So if you just want to see MRTs for papers, you can go to AMiner directly.

If you want to generate MRTs with customized settings or dive deeper to substitute some modules, read the following descriptions.

Run scripts to generate your MRT

Clone this branch first.

git clone git@github.com:THUDM/MRT.git -b mrtframework

Currently, this library supports SemanticScholar as data source. So to generate the MRT for your interested paper, you need to go to SemanticScholar and find the paper id for this paper. For example, the famous GPT-3 paper has the s2 paper id 6b85b63579a916f705a8e10a49bd8d849d91b1fc.

Then run the following scripts to generate the MRT for GPT-3.

python examples/generate_mrt_json.py \
--pub_id 6b85b63579a916f705a8e10a49bd8d849d91b1fc \
--output_path outputs/gpt-3.json

The output MRT will saved as Json file at location outputs/gpt-3.json.

There are some parameters you can change to alter the generation process. For example, you can set --use_sbert=0 to disable the use of Sentence-BERT and only use TF-IDF during the generation. A full list of configurable parameters can be listed with

python examples/generate_mrt_json.py -h

Notice that the SemanticScholar has rate limit for its api. Generating MRTs will trigger lots of api calls. Therefore, you may encounter rate limitation when using SemanticScholar data source. The use of Web API must follow the agreements of SemanticScholar.

Use the python library instead of cloning the codes

The mrtframework has already been published to the python library. So you can install the library and direcly call it.

# Install the library
pip install mrtframework
# Caculate mrt for the paper GPT-3 with SemanticScholar as data source
from mrtframework import MasterReadingTree
from mrtframework.data_provider import DataProvider
provider = DataProvider(downloader='s2')
query_pub = provider.get('6b85b63579a916f705a8e10a49bd8d849d91b1fc')
mrt = MasterReadingTree(provider=provider, query_pub=query_pub)
print(mrt.to_json())

Use customized data sources

If you want to use other data sources, you can write your own downloader for MRT to use as follows

def customized_downloader(pid: str) -> Optional[dict]:
    # do something here like retrieving data
    return {
        '_id': pid,
        'id': pid,
        'title': 'MRT: Tracing the Evolution of Scientific Publications',
        'abstract': 'The fast development of science and technology is accompanied by the booming of cutting edge research. Researchers need to digest more and more recently published publications in order to keep themselves up to date. This becomes tough in particular with the prevalence of preprint publishing such as arXiv, where inspiring works could come out without being peer-reviewed. Is that possible to design an automatic system to help researchers quickly gain a glimpse of a piece of work or gain useful background knowledge for deeply understanding it? To this end, we proposed a practical framework called Master Reading Tree (MRT) to trace the evolution of scientific publications. In this framework, we can build annotated evolution roadmaps for publications and identify important previous works or evolution tracks by generating expressive embeddings and clustering them into various groups. With comprehensive evaluations, our proposed framework demonstrates its superior capability in capturing underlying relations behind publications over several baseline algorithms. Finally, we integrated the proposed MRT framework on AMiner, an online academic platform, where users can generate roadmaps using MRT for free and their interactions are further used to refine the model.',
        'citations': [101, 102, 103], # the pids of citation papers
        'references': [104, 105, 106], # the pids of reference papers
        'year': 2021,
        'venue': 'TKDE',
        'authors': [{
            'name': 'Da Yin'
        }, {
            'name': 'Weng Lam Tam'
        }, {
            'name': 'Ming Ding'
        }, {
            'name': 'Jie Tang'
        }]
    }
# replace the downloader in provider
provider = DataProvider(downloader=customized_downloader)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrtframework-0.2.1.tar.gz (29.1 kB view hashes)

Uploaded Source

Built Distribution

mrtframework-0.2.1-py3-none-any.whl (36.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page