Package org.jsoup.examples
Class HtmlToPlainText
java.lang.Object
org.jsoup.examples.HtmlToPlainText
HTML to plain-text. This example program demonstrates the use of jsoup to convert HTML input to lightly-formatted
plain-text. That is divergent from the general goal of jsoup's .text() methods, which is to get clean data from a
scrape.
Note that this is a fairly simplistic formatter -- for real world use you'll want to embrace and extend.
To invoke from the command line, assuming you've downloaded the jsoup jar to your current directory:
java -cp jsoup.jar org.jsoup.examples.HtmlToPlainText url [selector]
-
Nested Class Summary
Nested Classes -
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiongetPlainText
(Element element) Format an Element to plain-textstatic void
-
Field Details
-
userAgent
- See Also:
-
timeout
private static final int timeout- See Also:
-
-
Constructor Details
-
HtmlToPlainText
public HtmlToPlainText()
-
-
Method Details
-
main
- Throws:
IOException
-
getPlainText
Format an Element to plain-text- Parameters:
element
- the root element to format- Returns:
- formatted text
-