Programming and my thoughts

Linux 의 egrep 커맨드는 정규표현식을 이용하여 파일 내용을 분석할 수 있는 매우 유용한 커맨드입니다.
egrep 에 대한 내용은 다른 곳에서도 쉽게 찾을 수 있어서…
저는 그냥 제가 사용했던 실례를 그냥 여기에 참고삼아 올려 놓습니다.

https://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html

[:alnum:]’

Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[0-9A-Za-z]’.

‘[:alpha:]’

Alphabetic characters: ‘[:lower:]’ and ‘[:upper:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[A-Za-z]’.

‘[:blank:]’

Blank characters: space and tab.

‘[:cntrl:]’

Control characters. In ASCII, these characters have octal codes 000 through 037, and 177 (DEL). In other character sets, these are the equivalent characters, if any.

‘[:digit:]’

Digits: 0 1 2 3 4 5 6 7 8 9.

‘[:graph:]’

Graphical characters: ‘[:alnum:]’ and ‘[:punct:]’.

‘[:lower:]’

Lower-case letters; in the ‘C’ locale and ASCII character encoding, this is a b c d e f g h i j k l m n o p q r s t u v w x y z.

‘[:print:]’

Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.

‘[:punct:]’

Punctuation characters; in the ‘C’ locale and ASCII character encoding, this is ! “ # $ % & ‘ ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.

‘[:space:]’

Space characters: in the ‘C’ locale, this is tab, newline, vertical tab, form feed, carriage return, and space. See Usage, for more discussion of matching newlines.

‘[:upper:]’

Upper-case letters: in the ‘C’ locale and ASCII character encoding, this is A B C D E F G H I J K L M N O P Q R S T U V W X Y Z.

‘[:xdigit:]’

Hexadecimal digits: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f.

‘]’

ends the bracket expression if it’s not the first list item. So, if you want to make the ‘]’ character a list item, you must put it first.

‘[.’

represents the open collating symbol.

‘.]’

represents the close collating symbol.

‘[=’

represents the open equivalence class.

‘=]’

represents the close equivalence class.

‘[:’

represents the open character class symbol, and should be followed by a valid character class name.

‘:]’

represents the close character class symbol.

‘-’

represents the range if it’s not first or last in a list or the ending point of a range.

‘^’

represents the characters not in the list. If you want to make the ‘^’ character a list item, place it anywhere but first.


egrep "V11X[[:alnum:][[:punct:] ]+ **CRITICAL ERROR**" *.log | rev | cut -d'_' -f 1 | rev | sort | uniq > test.out

egrep "V11X[[:alnum:][[:punct:] ]+ **CRITICAL ERROR**" *.log
*.log 파일에서 V11X[알파벳, 특수문자 ]+ **CRITICAL ERROR** 이라는 글자를 찾아서...

rev
각 행의 데이터를 역순으로 치환한 뒤,

cut -d'_' -f 1
'_' 값(underline)을 토큰으로 하여 split 한 결과중 첫번째 값을 취하고,

rev
이것을 다시 역순으로 치환한 뒤,

sort
알파벳 기수 정렬을 하고,

uniq
그 중 중복값을 제거하여

> test.out
결과를 test.out 으로 redirect 한다.

egrep -r "V9,[[:alnum:], ]+ serialNumbers" ./ | rev | cut -d':' -f 1 | rev > V9.JUL17~DEC17.TXT

egrep -r "V9,[[:alnum:], ]+ serialNumbers" ./
./ 경로 아래의 모든 하위 디렉토리를 포함하여 위 글자를 검색하고

rev
그렇게 추출한 데이터의 각 행을 역순으로 뒤집고,

cut -d':' -f 1
':' 값(colon)을 토큰으로 취하여, 그 첫번째 값을 가져와서,

rev
또 뒤집고,

> V9.JUL17~DEC17.TXT
그 결과를 위 파일로 생성한다.

1
2
3
4
5
6
7
8
9
10
11
this.pattern_Type = "Spiral_OD";    D:\Projects\XXX\London - Log XXX\London - XXX\GGG\XXX\SDF.cs    27    18    A
this.pattern_Type = "Short_Scratch_ID";    D:\Projects\XXX\London - Log XXX\London - GGG\GG\XXX\SDF.cs    27    18    B
this.pattern_Type = "Polynomial";    D:\Projects\XXX\London - Log XXX\London - XXX\GGG\XXX\DFDF.cs    29    18    C    
... (생략)
 
egrep "\"[[:alnum:][[:punct:] ]+" PATTERN.TXT | cut -d'=' -2 | cut -d';' -1 | sort | uniq
 
(결과)
"Spiral_OD"
"Short_Scratch_ID"
"Polynomial"
cs




계속해서 추가 예정