Pattern Analyse

Zur Kontrolle von speziellen Formaten wie Postleitzahlen, Telefonnummern oder auch Artikelnummern werden häufig reguläre Ausdrücke eingesetzt. Mit Hilfe des Data Profiling Tasks lassen sich diese für beliebige Spalten über die Regel Spaltenmusterprofile erstellen. Mit Hilfe einiger freier Komponenten wie z.B. RegExClean, RegExtractor oder Data Validation Transform lassen sich die regulären Ausdrücke innerhalb des Datenflusses recht gut verarbeiten.

Eine andere Methode um spezielle Formate zu prüfen ist die Pattern Analyse. Bei der Pattern Analyse werden die ASCII Zeichen einer bestimmten Gruppe wie Zahlen, Kleinbuchstaben, Großbuchstaben usw. innerhalb eines Strings durch jeweils ein gemeinsames Zeichen ersetzt.

Mit dem folgenden Code wird z.B. für die Spalte “ProductNumber” jeweils ein Pattern erstellt und in die Spalte “Pattern” geschrieben.

 

   1: Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

   2:  

   3:         Dim strInput As String

   4:         Dim strPattern As String

   5:         Dim intLength As Integer

   6:         Dim intASCII As Integer

   7:         Dim strASCIIChar As String

   8:  

   9:         strInput = Row.ProductNumber

  10:         intLength = strInput.Length

  11:  

  12:         If strInput = Nothing Or intLength <= 0 Then

  13:             strPattern = Nothing

  14:         Else

  15:  

  16:             For Counter As Integer = 0 To intLength - 1

  17:                 intASCII = Asc(strInput.Substring(Counter, 1))

  18:  

  19:                 If intASCII = 0 Then strASCIIChar = "z"

  20:                 If intASCII = 1 Then strASCIIChar = "z"

  21:                 If intASCII = 2 Then strASCIIChar = "?"

  22:                 If intASCII = 3 Then strASCIIChar = "?"

  23:                 If intASCII = 4 Then strASCIIChar = "?"

  24:                 If intASCII = 5 Then strASCIIChar = "?"

  25:                 If intASCII = 6 Then strASCIIChar = "?"

  26:                 If intASCII = 7 Then strASCIIChar = "?"

  27:                 If intASCII = 8 Then strASCIIChar = "?"

  28:                 If intASCII = 9 Then strASCIIChar = "t"

  29:                 If intASCII = 10 Then strASCIIChar = "c"

  30:                 If intASCII = 11 Then strASCIIChar = "?"

  31:                 If intASCII = 12 Then strASCIIChar = "?"

  32:                 If intASCII = 13 Then strASCIIChar = "c"

  33:                 If intASCII = 14 Then strASCIIChar = "?"

  34:                 If intASCII = 15 Then strASCIIChar = "?"

  35:                 If intASCII = 16 Then strASCIIChar = "?"

  36:                 If intASCII = 17 Then strASCIIChar = "?"

  37:                 If intASCII = 18 Then strASCIIChar = "?"

  38:                 If intASCII = 19 Then strASCIIChar = "?"

  39:                 If intASCII = 20 Then strASCIIChar = "?"

  40:                 If intASCII = 21 Then strASCIIChar = "?"

  41:                 If intASCII = 22 Then strASCIIChar = "?"

  42:                 If intASCII = 23 Then strASCIIChar = "?"

  43:                 If intASCII = 24 Then strASCIIChar = "?"

  44:                 If intASCII = 25 Then strASCIIChar = "?"

  45:                 If intASCII = 26 Then strASCIIChar = "?"

  46:                 If intASCII = 27 Then strASCIIChar = "?"

  47:                 If intASCII = 28 Then strASCIIChar = "?"

  48:                 If intASCII = 29 Then strASCIIChar = "?"

  49:                 If intASCII = 30 Then strASCIIChar = "?"

  50:                 If intASCII = 31 Then strASCIIChar = "?"

  51:                 If intASCII = 32 Then strASCIIChar = "_"

  52:                 If intASCII = 33 Then strASCIIChar = "p"

  53:                 If intASCII = 34 Then strASCIIChar = "q"

  54:                 If intASCII = 35 Then strASCIIChar = "p"

  55:                 If intASCII = 36 Then strASCIIChar = "m"

  56:                 If intASCII = 37 Then strASCIIChar = "p"

  57:                 If intASCII = 38 Then strASCIIChar = "p"

  58:                 If intASCII = 39 Then strASCIIChar = "q"

  59:                 If intASCII = 40 Then strASCIIChar = "p"

  60:                 If intASCII = 41 Then strASCIIChar = "p"

  61:                 If intASCII = 42 Then strASCIIChar = "p"

  62:                 If intASCII = 43 Then strASCIIChar = "+"

  63:                 If intASCII = 44 Then strASCIIChar = "S"

  64:                 If intASCII = 45 Then strASCIIChar = "-"

  65:                 If intASCII = 46 Then strASCIIChar = "S"

  66:                 If intASCII = 47 Then strASCIIChar = "S"

  67:                 If intASCII = 48 Then strASCIIChar = "d"

  68:                 If intASCII = 49 Then strASCIIChar = "d"

  69:                 If intASCII = 50 Then strASCIIChar = "d"

  70:                 If intASCII = 51 Then strASCIIChar = "d"

  71:                 If intASCII = 52 Then strASCIIChar = "d"

  72:                 If intASCII = 53 Then strASCIIChar = "d"

  73:                 If intASCII = 54 Then strASCIIChar = "d"

  74:                 If intASCII = 55 Then strASCIIChar = "d"

  75:                 If intASCII = 56 Then strASCIIChar = "d"

  76:                 If intASCII = 57 Then strASCIIChar = "d"

  77:                 If intASCII = 58 Then strASCIIChar = "S"

  78:                 If intASCII = 59 Then strASCIIChar = "S"

  79:                 If intASCII = 60 Then strASCIIChar = "p"

  80:                 If intASCII = 61 Then strASCIIChar = "p"

  81:                 If intASCII = 62 Then strASCIIChar = "p"

  82:                 If intASCII = 63 Then strASCIIChar = "p"

  83:                 If intASCII = 64 Then strASCIIChar = "p"

  84:                 If intASCII = 65 Then strASCIIChar = "u"

  85:                 If intASCII = 66 Then strASCIIChar = "u"

  86:                 If intASCII = 67 Then strASCIIChar = "u"

  87:                 If intASCII = 68 Then strASCIIChar = "u"

  88:                 If intASCII = 69 Then strASCIIChar = "u"

  89:                 If intASCII = 70 Then strASCIIChar = "u"

  90:                 If intASCII = 71 Then strASCIIChar = "u"

  91:                 If intASCII = 72 Then strASCIIChar = "u"

  92:                 If intASCII = 73 Then strASCIIChar = "u"

  93:                 If intASCII = 74 Then strASCIIChar = "u"

  94:                 If intASCII = 75 Then strASCIIChar = "u"

  95:                 If intASCII = 76 Then strASCIIChar = "u"

  96:                 If intASCII = 77 Then strASCIIChar = "u"

  97:                 If intASCII = 78 Then strASCIIChar = "u"

  98:                 If intASCII = 79 Then strASCIIChar = "u"

  99:                 If intASCII = 80 Then strASCIIChar = "u"

 100:                 If intASCII = 81 Then strASCIIChar = "u"

 101:                 If intASCII = 82 Then strASCIIChar = "u"

 102:                 If intASCII = 83 Then strASCIIChar = "u"

 103:                 If intASCII = 84 Then strASCIIChar = "u"

 104:                 If intASCII = 85 Then strASCIIChar = "u"

 105:                 If intASCII = 86 Then strASCIIChar = "u"

 106:                 If intASCII = 87 Then strASCIIChar = "u"

 107:                 If intASCII = 88 Then strASCIIChar = "u"

 108:                 If intASCII = 89 Then strASCIIChar = "u"

 109:                 If intASCII = 90 Then strASCIIChar = "u"

 110:                 If intASCII = 91 Then strASCIIChar = "p"

 111:                 If intASCII = 92 Then strASCIIChar = "S"

 112:                 If intASCII = 93 Then strASCIIChar = "p"

 113:                 If intASCII = 94 Then strASCIIChar = "p"

 114:                 If intASCII = 95 Then strASCIIChar = "S"

 115:                 If intASCII = 96 Then strASCIIChar = "q"

 116:                 If intASCII = 97 Then strASCIIChar = "l"

 117:                 If intASCII = 98 Then strASCIIChar = "l"

 118:                 If intASCII = 99 Then strASCIIChar = "l"

 119:                 If intASCII = 100 Then strASCIIChar = "l"

 120:                 If intASCII = 101 Then strASCIIChar = "l"

 121:                 If intASCII = 102 Then strASCIIChar = "l"

 122:                 If intASCII = 103 Then strASCIIChar = "l"

 123:                 If intASCII = 104 Then strASCIIChar = "l"

 124:                 If intASCII = 105 Then strASCIIChar = "l"

 125:                 If intASCII = 106 Then strASCIIChar = "l"

 126:                 If intASCII = 107 Then strASCIIChar = "l"

 127:                 If intASCII = 108 Then strASCIIChar = "l"

 128:                 If intASCII = 109 Then strASCIIChar = "l"

 129:                 If intASCII = 110 Then strASCIIChar = "l"

 130:                 If intASCII = 111 Then strASCIIChar = "l"

 131:                 If intASCII = 112 Then strASCIIChar = "l"

 132:                 If intASCII = 113 Then strASCIIChar = "l"

 133:                 If intASCII = 114 Then strASCIIChar = "l"

 134:                 If intASCII = 115 Then strASCIIChar = "l"

 135:                 If intASCII = 116 Then strASCIIChar = "l"

 136:                 If intASCII = 117 Then strASCIIChar = "l"

 137:                 If intASCII = 118 Then strASCIIChar = "l"

 138:                 If intASCII = 119 Then strASCIIChar = "l"

 139:                 If intASCII = 120 Then strASCIIChar = "l"

 140:                 If intASCII = 121 Then strASCIIChar = "l"

 141:                 If intASCII = 122 Then strASCIIChar = "l"

 142:                 If intASCII = 123 Then strASCIIChar = "p"

 143:                 If intASCII = 124 Then strASCIIChar = "S"

 144:                 If intASCII = 125 Then strASCIIChar = "p"

 145:                 If intASCII = 126 Then strASCIIChar = "p"

 146:                 If intASCII = 127 Then strASCIIChar = "?"

 147:                 If intASCII = 128 Then strASCIIChar = "!"

 148:                 If intASCII = 129 Then strASCIIChar = "!"

 149:                 If intASCII = 130 Then strASCIIChar = "!"

 150:                 If intASCII = 131 Then strASCIIChar = "!"

 151:                 If intASCII = 132 Then strASCIIChar = "!"

 152:                 If intASCII = 133 Then strASCIIChar = "!"

 153:                 If intASCII = 134 Then strASCIIChar = "!"

 154:                 If intASCII = 135 Then strASCIIChar = "!"

 155:                 If intASCII = 136 Then strASCIIChar = "!"

 156:                 If intASCII = 137 Then strASCIIChar = "!"

 157:                 If intASCII = 138 Then strASCIIChar = "!"

 158:                 If intASCII = 139 Then strASCIIChar = "!"

 159:                 If intASCII = 140 Then strASCIIChar = "!"

 160:                 If intASCII = 141 Then strASCIIChar = "!"

 161:                 If intASCII = 142 Then strASCIIChar = "!"

 162:                 If intASCII = 143 Then strASCIIChar = "!"

 163:                 If intASCII = 144 Then strASCIIChar = "!"

 164:                 If intASCII = 145 Then strASCIIChar = "!"

 165:                 If intASCII = 146 Then strASCIIChar = "!"

 166:                 If intASCII = 147 Then strASCIIChar = "!"

 167:                 If intASCII = 148 Then strASCIIChar = "!"

 168:                 If intASCII = 149 Then strASCIIChar = "!"

 169:                 If intASCII = 150 Then strASCIIChar = "!"

 170:                 If intASCII = 151 Then strASCIIChar = "!"

 171:                 If intASCII = 152 Then strASCIIChar = "!"

 172:                 If intASCII = 153 Then strASCIIChar = "!"

 173:                 If intASCII = 154 Then strASCIIChar = "!"

 174:                 If intASCII = 155 Then strASCIIChar = "!"

 175:                 If intASCII = 156 Then strASCIIChar = "!"

 176:                 If intASCII = 157 Then strASCIIChar = "!"

 177:                 If intASCII = 158 Then strASCIIChar = "!"

 178:                 If intASCII = 159 Then strASCIIChar = "!"

 179:                 If intASCII = 160 Then strASCIIChar = "!"

 180:                 If intASCII = 161 Then strASCIIChar = "!"

 181:                 If intASCII = 162 Then strASCIIChar = "!"

 182:                 If intASCII = 163 Then strASCIIChar = "m"

 183:                 If intASCII = 164 Then strASCIIChar = "!"

 184:                 If intASCII = 165 Then strASCIIChar = "!"

 185:                 If intASCII = 166 Then strASCIIChar = "!"

 186:                 If intASCII = 167 Then strASCIIChar = "!"

 187:                 If intASCII = 168 Then strASCIIChar = "!"

 188:                 If intASCII = 169 Then strASCIIChar = "!"

 189:                 If intASCII = 170 Then strASCIIChar = "!"

 190:                 If intASCII = 171 Then strASCIIChar = "!"

 191:                 If intASCII = 172 Then strASCIIChar = "!"

 192:                 If intASCII = 173 Then strASCIIChar = "!"

 193:                 If intASCII = 174 Then strASCIIChar = "!"

 194:                 If intASCII = 175 Then strASCIIChar = "!"

 195:                 If intASCII = 176 Then strASCIIChar = "!"

 196:                 If intASCII = 177 Then strASCIIChar = "!"

 197:                 If intASCII = 178 Then strASCIIChar = "!"

 198:                 If intASCII = 179 Then strASCIIChar = "!"

 199:                 If intASCII = 180 Then strASCIIChar = "!"

 200:                 If intASCII = 181 Then strASCIIChar = "!"

 201:                 If intASCII = 182 Then strASCIIChar = "!"

 202:                 If intASCII = 183 Then strASCIIChar = "!"

 203:                 If intASCII = 184 Then strASCIIChar = "!"

 204:                 If intASCII = 185 Then strASCIIChar = "!"

 205:                 If intASCII = 186 Then strASCIIChar = "!"

 206:                 If intASCII = 187 Then strASCIIChar = "!"

 207:                 If intASCII = 188 Then strASCIIChar = "!"

 208:                 If intASCII = 189 Then strASCIIChar = "!"

 209:                 If intASCII = 190 Then strASCIIChar = "!"

 210:                 If intASCII = 191 Then strASCIIChar = "!"

 211:                 If intASCII = 192 Then strASCIIChar = "!"

 212:                 If intASCII = 193 Then strASCIIChar = "!"

 213:                 If intASCII = 194 Then strASCIIChar = "!"

 214:                 If intASCII = 195 Then strASCIIChar = "!"

 215:                 If intASCII = 196 Then strASCIIChar = "!"

 216:                 If intASCII = 197 Then strASCIIChar = "!"

 217:                 If intASCII = 198 Then strASCIIChar = "!"

 218:                 If intASCII = 199 Then strASCIIChar = "!"

 219:                 If intASCII = 200 Then strASCIIChar = "!"

 220:                 If intASCII = 201 Then strASCIIChar = "!"

 221:                 If intASCII = 202 Then strASCIIChar = "!"

 222:                 If intASCII = 203 Then strASCIIChar = "!"

 223:                 If intASCII = 204 Then strASCIIChar = "!"

 224:                 If intASCII = 205 Then strASCIIChar = "!"

 225:                 If intASCII = 206 Then strASCIIChar = "!"

 226:                 If intASCII = 207 Then strASCIIChar = "!"

 227:                 If intASCII = 208 Then strASCIIChar = "!"

 228:                 If intASCII = 209 Then strASCIIChar = "!"

 229:                 If intASCII = 210 Then strASCIIChar = "!"

 230:                 If intASCII = 211 Then strASCIIChar = "!"

 231:                 If intASCII = 212 Then strASCIIChar = "!"

 232:                 If intASCII = 213 Then strASCIIChar = "!"

 233:                 If intASCII = 214 Then strASCIIChar = "!"

 234:                 If intASCII = 215 Then strASCIIChar = "!"

 235:                 If intASCII = 216 Then strASCIIChar = "!"

 236:                 If intASCII = 217 Then strASCIIChar = "!"

 237:                 If intASCII = 218 Then strASCIIChar = "!"

 238:                 If intASCII = 219 Then strASCIIChar = "!"

 239:                 If intASCII = 220 Then strASCIIChar = "!"

 240:                 If intASCII = 221 Then strASCIIChar = "!"

 241:                 If intASCII = 222 Then strASCIIChar = "!"

 242:                 If intASCII = 223 Then strASCIIChar = "!"

 243:                 If intASCII = 224 Then strASCIIChar = "!"

 244:                 If intASCII = 225 Then strASCIIChar = "!"

 245:                 If intASCII = 226 Then strASCIIChar = "!"

 246:                 If intASCII = 227 Then strASCIIChar = "!"

 247:                 If intASCII = 228 Then strASCIIChar = "!"

 248:                 If intASCII = 229 Then strASCIIChar = "!"

 249:                 If intASCII = 230 Then strASCIIChar = "!"

 250:                 If intASCII = 231 Then strASCIIChar = "!"

 251:                 If intASCII = 232 Then strASCIIChar = "!"

 252:                 If intASCII = 233 Then strASCIIChar = "!"

 253:                 If intASCII = 234 Then strASCIIChar = "!"

 254:                 If intASCII = 235 Then strASCIIChar = "!"

 255:                 If intASCII = 236 Then strASCIIChar = "!"

 256:                 If intASCII = 237 Then strASCIIChar = "!"

 257:                 If intASCII = 238 Then strASCIIChar = "!"

 258:                 If intASCII = 239 Then strASCIIChar = "!"

 259:                 If intASCII = 240 Then strASCIIChar = "!"

 260:                 If intASCII = 241 Then strASCIIChar = "!"

 261:                 If intASCII = 242 Then strASCIIChar = "!"

 262:                 If intASCII = 243 Then strASCIIChar = "!"

 263:                 If intASCII = 244 Then strASCIIChar = "!"

 264:                 If intASCII = 245 Then strASCIIChar = "!"

 265:                 If intASCII = 246 Then strASCIIChar = "!"

 266:                 If intASCII = 247 Then strASCIIChar = "!"

 267:                 If intASCII = 248 Then strASCIIChar = "!"

 268:                 If intASCII = 249 Then strASCIIChar = "!"

 269:                 If intASCII = 250 Then strASCIIChar = "!"

 270:                 If intASCII = 251 Then strASCIIChar = "!"

 271:                 If intASCII = 252 Then strASCIIChar = "!"

 272:                 If intASCII = 253 Then strASCIIChar = "!"

 273:                 If intASCII = 254 Then strASCIIChar = "!"

 274:                 If intASCII = 255 Then strASCIIChar = "!"

 275:  

 276:                 strPattern += strASCIIChar

 277:  

 278:             Next

 279:  

 280:         End If

 281:  

 282:         Row.Pattern = strPattern

 283:  

 284:     End Sub

Die Verteilung der Werte auf die unterschiedlichen Pattern kann recht gut mit dem Data Viewer dargestellt werden. So können Fehler/unterschiedliche Formate relativ schnell und einfach erkannt werden.

image

Mit dem Task “bedingtes Teilen” könnte man nun die unterschiedlichen Artikelnummern auftrennen und verarbeiten.

Mit Sicherheit ist die Verarbeitung großer Datenmengen mit Hilfe regulärer Ausdrücker schneller. Die Erstellung von regulären Ausdrücken kann jedoch relativ komplex sein, hier kann die Analyse über Pattern einfacher von der Hand gehen.