Bibtex Entry Extractor/Subsetter
By ET
Suppose you have a
file (e.g. latexfile.tex) with some
entries (in the form of \citet{zhang2007,dellarocas2006}, etc). Suppose these entries can be found in a huge bibtex file (e.g. references.bib) that includes more than 1000 bibtex entries.
When you are done with the $$LaTeX[/tex] file, you want to send out the paper to a journal. One way is to include this “references.bib” file with the latexfile.tex file. However, it is very difficult for the editors of the journal to use your huge file. It would be ideal if there could be a program to extract the right subset of references from the references.bib file and create a specific bibtex file for your article (latexfile.tex).
This is a real issue to me, since my co-author Feng Zhu started to manage all his references in one big file. Of course it is relatively small to carry this file around, however, when I have multiple collaborators, this will be a serious issue. I can not just copy all the references to another co-author, and if all people are doing the same thing, the reference bibtex file would be so large and extremely hard to keep in-sync among collaborators.
So I wrote the following program in PERL. Here is the introduction from the file:
FLIE: EXTRACTBIB.PL
(Rename the file to extractbib.pl after downloading)
Version: 1.0
Description:
This program traverses all citations in one latex file (e.g. latexfile.tex), then go to a big bibtex file (e.g. references.bib) and extract only those papers that appear in the latex file, and outputs a new bibtex file (e.g. latexfile.bib) with the subset of papers that appear in the tex file.
One way to use it is to manage all the references in one big file (online, or offline), when a paper is finished, the author can run this program to get a small bibtex file so that this small file can be sent to a journal.
I guess this is often needed, however, I have not found a good solution so far. So here is mine. It is fairly complicated to address different cases. I’ll try to update it when I find a need. If you have any suggestions, please let me know.
Xiaoquan (Michael) Zhang
Assistant Professor, Hong Kong University of Science and Technology
July 04, 2007
zhangxiaoquan (a) gmail.com
Usage: perl extractbib.pl latexfile.tex references.bib [output.bib]
latexfile.tex is the original tex file
references.bib is the bibtex file containing all the references
output.bib contains the subset of references appear in the tex file
(If the output filename “output.bib” is omitted, the program will
generate a bibtex file with name: latexfile.tex)
Download the file (extractbib.pl) Here…
(Rename the file to extractbib.pl after downloading)
P.S.
After posting this, I thought about some ideas to improve this.
- The easiest way is to implement a web interface for this program. I can do two possible things:
- post two “text areas” for people to copy and paste latex article and the bibtex file. I can return the result to a new text box.
- post two tabs for people to “Browse” and upload the files, and return the result to a text box as well as to a link to the bibtex file.
- Feng Zhu suggested writing a macro for WinEDT. I can foresee this to be very popular, but I don’t have time for that. Besides, I’m not a big fan of WinEDT. I use Bakoma and LyX more often.
Austin posted the following program that extracts bibtex entries from the aux file. (bibsubset.pl)

June 3rd, 2008 at 1:46 am
Thx! Austin. Yours is very cool!
I noticed your IP from MIT. Say hi to eastgate for me, my wife was missing 27C just now. She obviously enjoyed the time there while I was busy working on the problem sets for 14.271, 6.262, 15.575, etc.
June 2nd, 2008 at 9:30 am
I was looking for something exactly like this. Unfortunately all my files are spread around different tex files and parsing the source tex file doesn’t seem the right way to go. I put together a much simpler version that just parses the generated .aux file and subsets the bib file: http://austinche.name/misc/bibsubset.pl