How to customize password generation

You can use standard random function call

[Reflection.Assembly]::LoadWithPartialName(“System.Web”)
[System.Web.Security.Membership]::GeneratePassword(8,2) # (lengh,complexity)

In that case, you get simple but unpredictable range of symbols which can conflict with your password policy.

Or you can customise it using specific logic. For example, you need at least 1 symbol from each range of characters: a-z, A-Z and 0-9. Additionally, you can control length of the generated password. Here is code snippet:

#0-9A-Za-z
$sourcedata -eq $NULL;”48..57,65..90,97..122″.Split(‘,’) | Where {$_ -ne “”} | % {iex $_} | %{$sourcedata+=,[char][byte]$_ }
$charsLower=97..122 | %{ [Char] $_ }
$charsUpper=65..90 | %{ [Char] $_ }
$charsNumber=48..57 | %{ [Char] $_ }
Function Test-StringContents([String] $test, [Char[]] $chars)
{
foreach ($char in $test.ToCharArray())
{
if ($chars -ccontains $char)
{
return $true
}
}
return $false
}
Function GET-Temppassword() {
Param([int]$length=10)
For ($loop=1; $loop –le $length; $loop++) {
$TempPassword+=($sourcedata | GET-RANDOM)
}
if( (Test-StringContents $TempPassword $charsLower) `
-and (Test-StringContents $TempPassword $charsUpper) `
-and (Test-StringContents $TempPassword $charsNumber) ){
return $TempPassword
}
else{
return GET-Temppassword -length $length
}
}

Here is some tests:

GET-Temppassword –length 7
GET-Temppassword –length 12
GET-Temppassword #default lenght 10 symbols

Advertisements
Posted in PowerShell | Leave a comment

Oh My Search

I really love search. It is part of my life. I use it everyday as a way of knowledge. And that is not Google about.

The most tricky one is corporate search. It varies, is not consistent, and depends. It depends upon type of resources which we need to index. Typically, these resources  are: files, data and applications. The resources are transformed by search engines into indexes.

That is almost it. But variety of resource types requires different approaches.  Some files/data records need additional processing before indexing. The other ones mutate and jump from one location/scope to another making search indexes obsolete and out of date. Real time changes do not trigger refreshing search indexes.

So how we can survive in that wild corporate search. As always, by adding another level of abstraction. The process is divided by phases: preparation, collecting, presenting.

Let’s assume that we have shared folders with files as a type of resources. In Windows world, shared files are presented by UNC path, like \\company\shared\folder\ . As a rule, corporate shared folders are empowered by distributed file system (DFS) allowing to hide physical implementation and present file resources as mounted roots. But that is not essential, let’s concentrate on UNC path.  So, UNC root is entry point for file search. Search engine runs through collection of UNC roots and harvest data for indexing.

The next question – how search engine knows how to extract file content for indexing? The answer is – by file extension (or type). Text based files (.txt, .csv, .rtf) are easy ones. But other ones require more efforts. For example .pdf files could have text or scanned images inside. Some search filters (like Solr post.jar) are not able to process scanned pdf files. The other files requiry conversion before reading content (MS Office .doc, .ppt, .xls need to be converted to .docx, .pptx, .xlsx). Many .jpg files have plenty of attributes in EXIF format. And last but not least – content need to be cleaned and compressed  before indexing.

So, raw data (original file) should be transformed in ready for search data in staging format. For sure .json file is the best candidate for that. It is de-facto standard allowing to deliver content to number of search systems. We need to generate json file in hidden sub folder. That is a key moment – when parent folder is renamed – content for indexing is not lost. If file is changes – .json copy is rebuilt based on file attributes. That’s kind of decoupling between source data and destination index. I use different PowerShell scripts for collecting json files and indexing. It allows to avoid many-to-many (M x N) and use many-to-one ( M + N). In other words I can deliver json file to many search engines without direct interaction of search with original resource. In practice, the whole (M + N) search process runs  5-10 times faster than (M x N).

The full picture, sitting in my head, is a little bit frustrating. But it works and looks quite representative:

Search.UML.png

I am going to present this concept in details to our local SQL Server User Group soon.

Posted in Search | Leave a comment

Join Arrays

In SQL we use join day by day. But what if we need to intersect and sort 2D arrays directly in c#? Here is my example how to join person and donut by donut type. Including * for person who likes all types 🙂 :

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
class Arrays
{
    public static void Join()
    {
        string[][] donutConstraintPairs =  {
            new string[] {cruller, vegan},
            new string[] {eclair, chocolate},
            new string[] { zefir, chocolate }
        };
        string[][] candidateConstraintPairs =  {
            new string[] {jose, vegan},
            new string[] {john, chocolate},
            new string[] { ian, * } //all items for the type
        };
        var join = (from d in donutConstraintPairs
                                   .Select((data, row) => new { data, row })
                                   .Select(q => new { Name = q.data[0], Type = q.data[1], Row = q.row })
                     from c in candidateConstraintPairs
                                   .Select((data, row) => new { data, row })
                                   .Select(q => new { Name = q.data[0], Type = q.data[1], Row = q.row })
                where d.Type == c.Type || c.Type == *
                select new { Person = c.Name, Donut = d.Name }
           ).OrderBy(e => e.Person)
            .ThenBy(e => e.Donut)
            .Select(e=> new[] { e.Person, e.Donut })
            .ToArray();
        for (int i = 0; i < join.Length; i++)
        {
            Console.WriteLine(string.Format({0},{1},join[i][0],join[i][1]));
        }
    }
}
Here is results of method call Arrays.Join();        :
  • ian,cruller
  • ian,eclair
  • ian,zefir
  • john,eclair
  • john,zefir
  • jose,cruller
Posted in C#, SQL | Leave a comment

Project Oxford: Computer Vision API pros & cons

Microsoft project code named Oxford has number of interesting APIs. Let’s dig into one of them – OCR.

Computer Vision API provides OCR conversion from images to text. It could be tested from here. It is really fast, but requires understanding of the subject. For example, I have coloured image with a text

snip_20150923092422Here is result:

I don -t need 
l. Google 
my wife knows

Hmm, last string is lost. But if you apply charcoal transformation removing some noise

snip_20150923092751 Here is result:

I don-t need 
Go, gle 
my wife knows 
everything

Much better, last string is picked up. The same OCR engine Microsoft has in OneNote, but it has the same problem.

Pros:

  • Simple
  • Fast
  • Supports many languages and auto-detection (?)
  • REST based
  • Really interesting package of APIs and features

Cons:

  • Requires Azure subscription. Which means not free
  • Requires file preparation before load
  • Does not support vectorised images (SVG)
  • Does not understand image orientation

snip_20150923094654

  • Does not support photo EXIF attributes. This is key point if you know what I mean
  • Recognition is not as good and customizable as you can do with free tools like Tesseract-OCR, ImageMagic, autotrace, inkscape.

							
Posted in Uncategorized | Leave a comment

All date formats in one query

WITH
N0 as (SELECT 1 as n UNION ALL SELECT 1)
,N1 as (SELECT 1 as n FROM N0 t1, N0 t2)
,N2 as (SELECT 1 as n FROM N1 t1, N1 t2)
,N3 as (SELECT 1 as n FROM N2 t1, N2 t2)
,nums as (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as num FROM N3)
SELECT num, CONVERT(VARCHAR(20),GETDATE(),num)
FROM nums
WHERE num <132
AND num NOT BETWEEN 15 AND 19
AND num NOT BETWEEN 26 AND 99
AND num NOT BETWEEN 115 AND 119
AND num NOT BETWEEN 122 AND 125
AND num NOT BETWEEN 128 AND 129

Posted in Uncategorized | Tagged | Leave a comment

OpenProj free project managment example

OpenProj” is a project management application similar to Microsoft Project. It includes most of the features of Microsoft Project but is free. It could be used :

  • if you need to view a Microsoft Project file and do not have a licenced copy of Microsoft Project. Microsoft Project files can be opened and viewed in OpenProj.
  • if you are creating a work break structure (WBS)

Here is PowerShell example how to export some SQL data as WBS :

AdventureWorks_openproj

This approach allows to offset task management to another tool, delegate it to right person (project manager) and send it back to database when it is ready.

Another example – you have many SQL jobs implicitly depending each others.  How to organise  optimal schedules? Which next job should be invoked? You can control flow within each job for each step. But that is sequential scenario with many time gaps. But what if I have many jobs without sequential pattern, operating like a graph with many parallel and recursive entry points? What if I have many SQL servers with many jobs?

As a rule, each job has its own schedule. For example ETL jobs runs within maintenance time frame. The main problem – how to decrease maintenance time frame, let’s say from 8 hours to 1 hours window? I know right answer is – it depends! But that is not an option. First of all, we need to inject dependencies between jobs. And then we need to implement “I’m done” notification. I have seen number of implementations: alerts, email, events, invokers. It is up to you, Just keep in mind one more option described here.

Happy coding!

Posted in PowerShell, SQL | Leave a comment

How to highlight searching words in text

Here is quick example how to highlight searching words using full text search in TSQL. It also avoids highlighting stop words
/*
Author: Andrew Butenko
Create date: 14-01-2015
Description: highlight searching words in text including forms and excluding stop words
Example:

DECLARE @contents VARCHAR(MAX) = '123 defence ARMV-AT. NAVY-as, MC. - STRAIGHY, NEAR SOLDER CONTACTS. AN TYPE buiion 3: ', @searchTerm varchar(255) = 'defence navy'
SELECT [dbo].[HighLightSearchWords](@contents,@searchTerm)
*/
CREATE FUNCTION [dbo].[HighLightSearchWords]
(
@contents VARCHAR(8000),
@searchTerm varchar(255)
)
RETURNS VARCHAR(8000) AS
BEGIN
DECLARE @retval VARCHAR(8000) = ''
;WITH Forms AS (
SELECT DISTINCT prs.display_term
FROM sys.dm_fts_parser(N'FORMSOF(FREETEXT, "' + @searchTerm + '")', 1033, 0, 1) prs
LEFT OUTER JOIN [sys].[fulltext_stopwords] stw ON prs.display_term COLLATE Latin1_General_CI_AS = stw.[stopword] COLLATE Latin1_General_CI_AS
WHERE stw.[stopword] IS NULL
)
, Matches(display_term, start_pos) AS(
SELECT display_term, CHARINDEX(display_term, @contents, 0) as start_pos
FROM Forms
WHERE CHARINDEX(display_term, @contents, 0)>0
UNION ALL
SELECT m.display_term, CHARINDEX(m.display_term, @contents, m.start_pos+LEN(m.display_term)) as start_pos
FROM Matches AS m
WHERE CHARINDEX(m.display_term, @contents, m.start_pos+LEN(m.display_term))>0)
, Results AS(
SELECT ROW_NUMBER() OVER(ORDER BY start_pos) AS Id, start_pos, MAX(LEN(display_term)) as term_length
FROM Matches
GROUP BY start_pos)
SELECT @retval += (CASE l.Id WHEN 1 THEN SUBSTRING(@contents, 1, l.start_pos-1) ELSE '' END) --left
+'' + SUBSTRING(@contents, l.start_pos, l.term_length)+'' --middle
+(CASE WHEN r.start_pos - (l.start_pos + l.term_length + 1) > 0 THEN SUBSTRING(@contents, l.start_pos + l.term_length +1, r.start_pos - (l.start_pos + l.term_length + 1) )
ELSE SUBSTRING(@contents, l.start_pos + l.term_length, LEN(@contents))
END) --right
FROM Results l
LEFT OUTER JOIN Results r ON l.Id = r.Id - 1
ORDER BY l.start_pos
RETURN @retval
END

Posted in SQL | Tagged , | Leave a comment