Suppose you have a
file (e.g. latexfile.tex) with some
entries (in the form of \citet{zhang2007,dellarocas2006}, etc). Suppose these entries can be found in a huge bibtex file (e.g. references.bib) that includes more than 1000 bibtex entries.
When you are done with the $$LaTeX[/tex] file, you want to send out the paper to a journal. One way is to include this “references.bib” file with the latexfile.tex file. However, it is very difficult for the editors of the journal to use your huge file. It would be ideal if there could be a program to extract the right subset of references from the references.bib file and create a specific bibtex file for your article (latexfile.tex).
This is a real issue to me, since my co-author Feng Zhu started to manage all his references in one big file. Of course it is relatively small to carry this file around, however, when I have multiple collaborators, this will be a serious issue. I can not just copy all the references to another co-author, and if all people are doing the same thing, the reference bibtex file would be so large and extremely hard to keep in-sync among collaborators.
So I wrote the following program in PERL. Here is the introduction from the file:
FLIE: EXTRACTBIB.PL
(Rename the file to extractbib.pl after downloading)
Version: 1.0
Description:
This program traverses all citations in one latex file (e.g. latexfile.tex), then go to a big bibtex file (e.g. references.bib) and extract only those papers that appear in the latex file, and outputs a new bibtex file (e.g. latexfile.bib) with the subset of papers that appear in the tex file.
One way to use it is to manage all the references in one big file (online, or offline), when a paper is finished, the author can run this program to get a small bibtex file so that this small file can be sent to a journal.
I guess this is often needed, however, I have not found a good solution so far. So here is mine. It is fairly complicated to address different cases. I’ll try to update it when I find a need. If you have any suggestions, please let me know.
Xiaoquan (Michael) Zhang
Assistant Professor, Hong Kong University of Science and Technology
July 04, 2007
zhangxiaoquan (a) gmail.com
Usage: perl extractbib.pl latexfile.tex references.bib [output.bib]
latexfile.tex is the original tex file
references.bib is the bibtex file containing all the references
output.bib contains the subset of references appear in the tex file
(If the output filename “output.bib” is omitted, the program will
generate a bibtex file with name: latexfile.tex)
Download the file (extractbib.pl) Here…
(Rename the file to extractbib.pl after downloading)
P.S.
After posting this, I thought about some ideas to improve this.
- The easiest way is to implement a web interface for this program. I can do two possible things:
- post two “text areas” for people to copy and paste latex article and the bibtex file. I can return the result to a new text box.
- post two tabs for people to “Browse” and upload the files, and return the result to a text box as well as to a link to the bibtex file.
- Feng Zhu suggested writing a macro for WinEDT. I can foresee this to be very popular, but I don’t have time for that. Besides, I’m not a big fan of WinEDT. I use Bakoma and LyX more often.
Austin posted the following program that extracts bibtex entries from the aux file. (bibsubset.pl)