Monday, July 30, 2012

Parsing, Tokenizing and Formatting


Parsing

This section is intended to review all the approches to find, token and format stuff.
Let's see the first example:


As you can see the mehtod matcher from the class Matcher gets a source and the class Pattern uses the method compile to handle a pattern that you want to search.

!Important:

  a b a b a b a
  0 1 2 3 4 5 6

Why the little program showed above didn't print 0 2 4?
The reason is the regex engine does not consider the index 2 because it was consumed, and cannot be reused, but there are expections for this rule and will be shown sooner.

  a b a
  0 1 2

Using Metacharacters

\d A digit
\s A whitespace character
\w A word character (letters, digits, or "_" (underscore))
 . Any character

Output
Matcher ==> 0
5 7 16
Matcher ==> 1
6 14
Matcher ==> 2
0 1 2 3 4 5 7 8 9 10 11 12 13 15 16
Matcher ==> 3
0 1 2
Matcher ==> 4
0 1 2 3 4 8 13
Matcher ==> 5
0 1 2 3 4 8 10 13 15

Using Quantifiers

+ One or more occurrences
* Zero or more occurrences
? Zero or one occurrence


Greedy Quantifiers



Tokenizing

Tokenizing is the process of taking big pieces of source data, breaking them into
little pieces, and storing the little pieces in variables

Tokenizing with Scanner


Formatting with printf() and format()

Both methods have exactly the same behaviour which means anything we say about one of these methods is applicable to both.

Let's see how formatting works:

%[arg_index$][flags][width][.precision]conversion char

The values within [ ] are optional.

1. arg_index - An integer followed directly by a $, this indicates which argument should be printed in this position.

2. flags - While many flags are available, for the exam you'll need to know:
¦ "-" Left justify this argument
¦ "+" Include a sign (+ or -) with this argument
¦ "0" Pad this argument with zeroes
¦ "," Use locale-specific grouping separators (i.e., the comma in 123,456)
¦ "(" Enclose negative numbers in parentheses


3. width -  This value indicates the minimum number of characters to print. (If you
want nice even columns, you'll use this value extensively.)

4. precision - For the exam you'll only need this when formatting a floating-point
number, and in the case of floating point numbers, precision indicates the number of
digits to print after the decimal point.

5. conversion - The type of argument you'll be formatting. You'll need to know:
¦ b boolean
¦ c char
¦ d integer
¦ f floating point
¦ s string





No comments: