Random Walk TripleRush: Asynchronous Graph Querying and Sampling

Most Semantic Web applications rely on querying graphs, typically by using SPARQL with a triple store. Increasingly, applications also analyze properties of the graph structure to compute statistical inferences. The current Semantic Web infrastructure, however, does not efficiently support such op- erations. This forces developers to extract the relevant data for external statistical post-processing.

In this paper we propose to rethink query execution in a triple store as a highly parallelized asynchronous graph ex- ploration on an active index data structure. This approach also allows to integrate SPARQL-querying with the sam- pling of graph properties.

To evaluate this architecture we implemented Random Walk TripleRush, which is built on a distributed graph processing system. Our evaluations show that this architecture enables both competitive graph querying, as well as the ability to execute various types of random walks with restarts that sample interesting graph properties. Thanks to the asyn- chronous architecture, first results are sometimes returned in a fraction of the full execution time. We also evaluate the scalability and show that the architecture supports fast query-times on a dataset with more than a billion triples.

Full Paper:
Stutz P., Paudel B., Verman C.M., Bernstein A., (2015), Random-Walk TripleRush: Asynchronous Graph Querying and Sampling, In: 24th International World Wide Web Conference (WWW 2015), Conference or Workshop Paper published in Proceedings.


Leave a Reply

Your email address will not be published. Required fields are marked *