Warning: include_once(/home/nullvoid/blog.mikezhang.com/wp-content/plugins/wordpress-support/wordpress-support.php): failed to open stream: Permission denied in /home/nullvoid/blog.mikezhang.com/wp-settings.php on line 217

Warning: include_once(): Failed opening '/home/nullvoid/blog.mikezhang.com/wp-content/plugins/wordpress-support/wordpress-support.php' for inclusion (include_path='.:/usr/local/lib/php:/usr/local/php5/lib/pear') in /home/nullvoid/blog.mikezhang.com/wp-settings.php on line 217
NullVoid » Bibtex Entry Extractor/Subsetter NullVoid » Blog Archive » Bibtex Entry Extractor/Subsetter

Bibtex Entry Extractor/Subsetter


UPDATE: (2014-01-21) This script was further updated by Dr. Florian Kluge of Universität Augsburg.
For the new script, please visit:

Suppose you have a LaTeX file (e.g. latexfile.tex) with some BibTeX entries (in the form of \citet{zhang2007,dellarocas2006}, etc). Suppose these entries can be found in a huge bibtex file (e.g. references.bib) that includes more than 1000 bibtex entries.

When you are done with the LaTeX file, you want to send out the paper to a journal. One way is to include this “references.bib” file with the latexfile.tex file. However, it is very difficult for the editors of the journal to use your huge file. It would be ideal if there could be a program to extract the right subset of references from the references.bib file and create a specific bibtex file for your article (latexfile.tex).

This is a real issue to me, since my co-author Feng Zhu started to manage all his references in one big file. Of course it is relatively small to carry this file around, however, when I have multiple collaborators, this will be a serious issue. I can not just copy all the references to another co-author, and if all people are doing the same thing, the reference bibtex file would be so large and extremely hard to keep in-sync among collaborators.

So I wrote the following program in PERL. Here is the introduction from the file:


(Rename the file to extractbib.pl after downloading)

Version: 1.0


This program traverses all citations in one latex file (e.g. latexfile.tex), then go to a big bibtex file (e.g. references.bib) and extract only those papers that appear in the latex file, and outputs a new bibtex file (e.g. latexfile.bib) with the subset of papers that appear in the tex file.

One way to use it is to manage all the references in one big file (online, or offline), when a paper is finished, the author can run this program to get a small bibtex file so that this small file can be sent to a journal.

I guess this is often needed, however, I have not found a good solution so far. So here is mine. It is fairly complicated to address different cases. I’ll try to update it when I find a need. If you have any suggestions, please let me know.

Xiaoquan (Michael) Zhang

Assistant Professor, Hong Kong University of Science and Technology

July 04, 2007

zhangxiaoquan (a) gmail.com

Usage: perl extractbib.pl latexfile.tex references.bib [output.bib]

latexfile.tex is the original tex file

references.bib is the bibtex file containing all the references

output.bib contains the subset of references appear in the tex file

(If the output filename “output.bib” is omitted, the program will

generate a bibtex file with name: latexfile.tex)

Download the file (extractbib.pl) Here…

(Rename the file to extractbib.pl after downloading)


After posting this, I thought about some ideas to improve this.

  1. The easiest way is to implement a web interface for this program. I can do two possible things:
    1. post two “text areas” for people to copy and paste latex article and the bibtex file. I can return the result to a new text box.
    2. post two tabs for people to “Browse” and upload the files, and return the result to a text box as well as to a link to the bibtex file.
  2. Feng Zhu suggested writing a macro for WinEDT. I can foresee this to be very popular, but I don’t have time for that. Besides, I’m not a big fan of WinEDT. I use Bakoma and LyX more often.

Austin posted the following program that extracts bibtex entries from the aux file. (bibsubset.pl)

Update (2011-01-07):

Vincent Guillet (Vincent.Guillet {a} ias.u-psud.fr) sent me a modified version of my program, I’ve updated the link so it points to the new version now. The three corrections are:
- erase [xx][xxx] after \cite, which makes the code go wrong when } are inside [] like in : \citep[$\d n(a) \propto a^{-3.5}\,\d a$,][]{MRN77}

- allow “^},” in the bib file without signifying the end of the reference

- suppress @ character in bib output file which make my bibtex nervous …

10 Responses to “Bibtex Entry Extractor/Subsetter”

  1. Austin Che Says:

    I was looking for something exactly like this. Unfortunately all my files are spread around different tex files and parsing the source tex file doesn’t seem the right way to go. I put together a much simpler version that just parses the generated .aux file and subsets the bib file: http://austinche.name/misc/bibsubset.pl

  2. ET Says:

    Thx! Austin. Yours is very cool!
    I noticed your IP from MIT. Say hi to eastgate for me, my wife was missing 27C just now. She obviously enjoyed the time there while I was busy working on the problem sets for 14.271, 6.262, 15.575, etc.

  3. Brian Says:

    Nice job. there is better option though

    To extract bibtex entries from the aux file you can use jabRef. There is one item under menu “Tool” called “New subdatabase based on AUX file”.
    Quite convenient.

  4. ET Says:

    using aux file is indeed a good idea.

  5. hectorpal Says:

    There is also “bibtool”
    that allows to do this extraction. I installed using apt-get in ubuntu.
    The main drawback of bibtool the aux includes other aux files. BibTex items get repeated.

  6. bdp Says:

    If you edit your latex files using emacs and the excellent auctex mode with it’s partner reftex, there is a menu entry (“Ref -> Global Actions -> Create BibTeX File”) which does this for you.

  7. Willi Says:

    Brilliant script! Thanks for sharing.

  8. Krishna Says:

    Your script really helped me. Thank you.

  9. Abbie Kressner Says:

    This is SUPER handy! Thanks for sharing!

  10. travers Says:

    Hurrah! Finally I got a website from where I be able to actually get helpful facts
    regarding my study and knowledge.

Leave a Reply

You are visitor number several since September 1, 2001

Copyright Xiaoquan (Michael) Zhang, 2004-2020. All rights reserved.
All trademarks property of their owners.