Compound search in texts with Document Extractor

Version 5.5.0.1

Document Extractor is an API class that searches for compound names (IUPAC or traditional) in text file types as html, txt, xml and pdf and converts them to chemical structures. This class can also be called on the command-line. It then expects the name of a plain text file as the first argument (or from the standard input when absent). The list of hits is printed on the standard output.

API documentation

See the API documentation here.